News | Olga Slizovskaia Personal Websitehttp://olgaslizovskaia.ml/blog/NewsenWed, 20 Sep 2017 14:49:54 +0000ESI Workshop: Systematic approaches to deep learning methods for audiohttp://olgaslizovskaia.ml/blog/esi-workshop-systematic-approaches-to-deep-learning-methods-for-audio/<p>Last week I was delighted to attend<span> </span><a href="http://www.univie.ac.at/nuhag-php/program/?event=esi17">Systematic approaches to deep learning methods for audio</a><span> <span>workshop, </span></span>organized by<span> </span><a href="http://www.esi.ac.at/">Erwin Schrödinger Institute</a><span> </span>in Vienna where I was presenting an ongoing work on the analysis of audio-visual correspondences.</p>
<p>The idea of the workshop was in <span>bringing together mathematicians and machine learning researchers working on audio- and deep learning related problems. The</span> following topics have been proposed for the discussion:</p>
<ul>
<li>Mathematical understanding of deep learning</li>
<li>Introspection in deep learning</li>
<li>End-to-end learning in MIR</li>
<li>Signal representations in deep learners vs. adaptive signal transforms </li>
<li>Scattering transforms and signal representations in deep learners </li>
</ul>
<p>I would like to <span>highlight some of </span>the presented work and share my notes and feelings.</p>
<p>I could recap the first big question we discussed as follows: "Can<span><span> </span>we formalize something that already exists in data and impose this info into DNNs?</span>" It seems that imposing domain knowledge to the<span> </span>networks<span> </span>is under a great and active discussion right now. Some examples here:</p>
<ul>
<li><span>Irene Waldspurger presented her work entitled "Inversion of the wavelet transform modulus." Her talk was about audio <span>reconstruction </span>from scalagrams, specifically, from Cauchy wavelets, as well as about scattering transforms and the possibility of using them as initialization for CNNs.</span></li>
<li><span>The excellent talk on "Invariant and selective data representations with applications to Deep learning" given by Fabio Anselmi.</span></li>
<li><span><span>Other notable presentations were given by Joakim Andén and Vincent Lostanlen. They discussed the use of joint time-scattering transform for CNNs and scattering on the pitch spiral. They proposed to use a hierarchical CNN where filters of the first few layers are fixed and presented in the form of multiple scattering transforms. The joint scattering has been shown to be time-shift invariant, frequency transposition invariant and robust to time-warping deformations.</span></span></li>
</ul>
<p>To my great joy, the topic of probabilistic networks and deriving optimal architectures came to the discussion several times:</p>
<ul>
<li>Philipp Grohs discussed the variety of open theoretical questions in his talk "Deep Learning as a Mathematician" (you can find the slides<span> </span><a href="http://www.univie.ac.at/nuhag-php/dateien/talks/3357_grohs.pdf">here</a>)</li>
<li>Antoine Deleforge presented the work: "Reversed Mixture-of-Experts Networks for High- to Low-Dimensional Regression" about the low-dimensional estimation of high-dimensional data (for such tasks as sound source estimation or human pose estimation), and building inverse regression networks based on combining the mixture of experts with the final gating network. </li>
<li>
<p><span>Karen Ullrich gave a talk</span> on Bayesian Networks with applications in sparcification and overconfidence evaluation.</p>
</li>
<li>The presentation entitled "Bayesian meter tracking on learned signal representations" given by Andre Holzapfel was related to probabilistic post-processing of the results obtained with CNNs.</li>
</ul>
<p>Another topic that resonates well with my work is understanding and interpretability of the learned data representations and networks. It has been brought to consideration in many of the talks in one form or another, but the following lectures were devoted to this topic entirely:</p>
<ul>
<li>Grégoire Montavon in his talk "Explaining the Predictions of Deep Neural Networks" presented several methods <span>for network explanation such as</span> <span>Taylor decomposition and layer-wise relevance propagation (LRP). It's worth mentioning that they have a great online demo (<a href="http://heatmapping.org/">http://heatmapping.org/</a>). Also, they are organizing a workshop at NIPS on the topic of interpretability: <a href="http://www.interpretable-ml.org/nips2017workshop">http://www.interpretable-ml.org/nips2017workshop</a><span>.</span></span></li>
<li><span>Mishra Saumitra discussed the importance of local interpretability of the network predictions for music. His work is focused on extending LIME algorithm for music content analysis. The code of their SoundLIME system is available online (<a href="https://code.soundsoftware.ac.uk/projects/SoundLIME">https://code.soundsoftware.ac.uk/projects/SoundLIME</a>). He mentioned two workshops which are organized by QMUL in the following months: a more general <a href="https://nips.cc/Conferences/2017/Schedule?showEvent=8790">Machine Learning for Sound and Music Information Retrieval</a><span> at NIPS, and <a href="http://c4dm.eecs.qmul.ac.uk/horse2017/">HORSE2017</a>.</span></span></li>
</ul>
<p>Last but not least, we had a great discussion on multimodality. In particular:</p>
<ul>
<li>Matthias Dorfer presented his work on audio-visual score following and retrieval;</li>
<li>Oriol Nieto presented several enhancements for the cold-start problem in music recommendations (which is the work maintained by Sergio Oramas in collaboration with Pandora)</li>
<li>Hendrik Koops shared their experience on user-oriented chord estimation</li>
<li>and I was talking about my research on multimodal musical instrument recognition.</li>
</ul>
<p>There are many useful presentations left out of the scope of this short review but notwithstanding were very interesting and of high scientific quality. </p>
<p></p>
<p></p>
<p>I would like to thank Monika and Arthur for organizing such a great event and inviting me, as well as my supervisors Emilia and Gloria for giving me an opportunity to participate. It was, undoubtedly, useful and <span>extremely educational.</span></p>
<p></p>
<p>In the photo: the longest blackboard I've ever seen in my life :)</p>
<p><img height="423" src="http://olgaslizovskaia.ml/static/media/uploads/.thumbnails/img_20170912_123818.jpg/img_20170912_123818-564x423.jpg" width="564"/></p>olgaWed, 20 Sep 2017 14:49:54 +0000http://olgaslizovskaia.ml/blog/esi-workshop-systematic-approaches-to-deep-learning-methods-for-audio/Deep|Bayes Summer Schoolhttp://olgaslizovskaia.ml/blog/deepbayes-summer-school/<p class="p1">At the end of August, I participated in Summer school on Bayesian methods for Deep Learning.</p>
<p class="p1">I think it a good reason to start writing for the blog finally :)</p>
<p class="p1">Deep|Bayes school was organized by <a href="https://cs.hse.ru/en/bayesgroup/"><span class="s1">Bayesian Methods Research Group</span></a> of Higher School of Economics, Moscow, Russia. The program of the school covered a high variety of subjects in Deep Learning and Bayesian statistics as well as practices on such topics as VAE, GANs, Gaussian Processes and DL models with attention.</p>
<p class="p1"><br>Some useful<span> </span>takeaway<span> </span>messages:</br></p>
<ol class="ol1">
<li class="li1">One can add noise to the network weights and use variational dropout for
<ol>
<li class="li1">reducing the variance of stochastic gradients [1]</li>
<li class="li1">sparsifying the network up to 95% [2]</li>
</ol>
</li>
<li class="li1">Even good DNN models can suffer in case of randomly assigned labels: they will learn but not generalize [3]. It seems that bayesian NN might be useful in this case (at least, they will refuse to train) </li>
<li class="li1">Dropout is a standard technique for<span> </span>ensembling! In Lasagne, there is a simple parameter deterministic=False which allows you to get different predictions from the same network:
<pre><span class="n">T</span><span class="o">.</span><span class="n">mean</span><span class="p">([</span><span class="n">lasagne</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">get_output</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">deterministic</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span></pre>
If performing the Bayesian model selection, the dropout probability <em>p</em> can be selected from data.</li>
<li class="li1">Any method which doesn't overfit is a wrong method. We need to gradually make it more and more complicated until it starts to overfit and then think of how to regularise it. Let’s<span> </span>overfit!</li>
<li class="li1">Attention in deep models improves interpretability and provides better results. Attention may be interpreted as a latent variable. Attention is all you need :) [4] </li>
<li class="li1">With the Bayesian framework, we can measure the uncertainty of the model and even differ between healthy and adversarial examples [5] </li>
<li class="li1">Almost any prior can be added into the model as a latent variable. Unfortunately, only a few people know how to do it (I'm not among them).</li>
</ol>
<p class="p1"><br>Apart from the intense scientific program/content, it was very nice to meet people from industry and learn their cases of ML/DL usage. Of course, there were people from NLP and Computer Vision, but it<span class="Apple-converted-space"></span><span> </span>was a big surprise for me to know, that security companies have some ill-formalized and non-routine ML tasks as well.</br></p>
<p class="p2"></p>
<p class="p1">I would like to thank the organizers for such a great opportunity to learn and refresh many topics in deep learning and<span> </span>Bayesian<span> </span>statistics, as well as for a possibility to<span> </span>socialize<span> </span>with others. The organizers plan to make the next edition of the school in<span> </span>English<span> </span>so I strongly encourage everybody interested in participating.</p>
<p class="p2"></p>
<p class="p1">While I’m still writing a short overview for each day of the school, you can have a look at <a href="https://drive.google.com/drive/folders/0B2zoFVYw1rN3WUZUS2ZGWmhobkU">slides</a> (in English), <a href="https://github.com/bayesgroup/deepbayes2017">seminar’s notebooks</a> (in Python), and <a href="https://www.youtube.com/playlist?list=PLEqoHzpnmTfBSyGmE4nBlhxxi28dCZwWN">videos</a> (in Russian).</p>
<p class="p2"></p>
<p class="p1">Well, also it’s worth to mention, that<span> </span>organizers<span> </span>have a great sense of<span> </span>humor: it’s me with the reincarnation of Thomas Bayes, <a href="https://www.hse.ru/en/staff/dvetrov">Dmitry Vetrov</a>. And, yeah, it was quite deep :)</p>
<p class="p1"><img alt="" height="467" src="http://olgaslizovskaia.ml/static/media/uploads/deepbayes.jpg" width="311"/></p>
<p class="p1"></p>
<p class="p1"><span>[1] Kingma, Diederik P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick."<span> </span></span><i>Advances in Neural Information Processing Systems</i><span>. 2015.</span></p>
<p class="p1"><span><span>[2] <span>Molchanov, Dmitry, Arsenii Ashukha, and Dmitry Vetrov. "Variational Dropout Sparsifies Deep Neural Networks." <span><span><em>International Conference on Machine Learning (ICML 2017).</em> </span></span></span><span>2017.</span></span></span></p>
<p class="p1"><span><span>[3] Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization."<span> </span></span><i>arXiv preprint arXiv:1611.03530</i><span><span> </span>(2016).</span></span></p>
<p class="p1"><span><span>[4] <span>Vaswani, Ashish, et al. "Attention Is All You Need."<span> </span></span><i>arXiv preprint arXiv:1706.03762</i><span><span> </span>(2017).</span></span></span></p>
<p class="p1"><span><span><span>[5] <span>Li, Yingzhen, and Yarin Gal. "Dropout Inference in Bayesian Neural Networks with Alpha-divergences."<span><span> <span><span><em>International Conference on Machine Learning (ICML 2017).</em> </span></span></span><span>2017.</span></span></span></span></span></span></p>
<p class="p1"><span><span><span><span><span><span></span></span></span></span></span></span></p>olgaFri, 01 Sep 2017 05:45:22 +0000http://olgaslizovskaia.ml/blog/deepbayes-summer-school/Welcome to My Blog!http://olgaslizovskaia.ml/blog/welcome-to-my-blog/<p></p>
<p><span>Hello, and welcome to my new website and blog.</span></p>
<p><span>I plan to publish here my thought about research, teaching, programming notes and some stories from my life.</span></p>
<p><span>Hope, you could find something interesting here!</span></p>olgaFri, 10 Mar 2017 15:52:09 +0000http://olgaslizovskaia.ml/blog/welcome-to-my-blog/