{"id":1852,"date":"2011-07-11T10:20:23","date_gmt":"2011-07-11T16:20:23","guid":{"rendered":"http:\/\/hunch.net\/?p=1852"},"modified":"2011-07-11T10:20:23","modified_gmt":"2011-07-11T16:20:23","slug":"interesting-neural-network-papers-at-icml-2011","status":"publish","type":"post","link":"https:\/\/hunch.net\/?p=1852","title":{"rendered":"Interesting Neural Network Papers at ICML 2011"},"content":{"rendered":"<p>Maybe it&#8217;s too early to call, but with four separate Neural Network sessions at this year&#8217;s <a title=\"ICML\" href=\"http:\/\/www.icml-2011.org\/\">ICML<\/a>,  it looks like Neural Networks are making a comeback. Here are my  highlights of these sessions. In general, my feeling is that these  papers both demystify deep learning and show its broader applicability.<\/p>\n<p>The first observation I made is that the once disreputable &#8220;Neural&#8221; nomenclature is being used again <span dir=\"ltr\">in lieu of<\/span> &#8220;deep learning&#8221;. Maybe it&#8217;s because Adam Coates et al. showed that single layer networks can work surprisingly well.<\/p>\n<ul>\n<li><a href=\"http:\/\/www.stanford.edu\/%7Eacoates\/papers\/coatesleeng_aistats_2011.pdf\">An Analysis of Single-Layer Networks in Unsupervised Feature       Learning<\/a>, <a href=\"http:\/\/www.stanford.edu\/%7Eacoates\/\">Adam Coates<\/a>, <a href=\"http:\/\/www.eecs.umich.edu\/%7Ehonglak\/\">Honglak Lee<\/a>, <a href=\"http:\/\/www.cs.stanford.edu\/people\/ang\/\">Andrew Y. Ng<\/a> (AISTATS 2011)<\/li>\n<li><a href=\"http:\/\/www.stanford.edu\/%7Eacoates\/papers\/coatesng_icml_2011.pdf\">The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization<\/a>, <a href=\"http:\/\/www.stanford.edu\/%7Eacoates\/\">Adam Coates<\/a>, <a href=\"http:\/\/www.cs.stanford.edu\/people\/ang\/\">Andrew Y. Ng<\/a> (ICML 2011)<\/li>\n<\/ul>\n<p>Another surprising result out of Andrew Ng&#8217;s group comes from Andrew  Saxe et al. who show that certain convolutional pooling architectures  can obtain close to state-of-the-art performance with random weights  (that is, without actually learning).<\/p>\n<ul>\n<li><a href=\"http:\/\/www.icml-2011.org\/papers\/551_icmlpaper.pdf\">On Random Weights and Unsupervised Feature Learning<\/a>, <a href=\"http:\/\/www.stanford.edu\/%7Easaxe\/\">Andrew M. Saxe<\/a>, Pang Wei Koh, Zhenghao Chen, <a href=\"http:\/\/www.stanford.edu\/%7Embhand\/\">Maneesh Bhand<\/a>, <a href=\"http:\/\/stanford.edu\/%7Ebipins\/home.html\">Bipin Suresh<\/a>, <a href=\"http:\/\/www.cs.stanford.edu\/people\/ang\/\">Andrew Y. Ng<\/a><\/li>\n<\/ul>\n<p>Of course, in most cases we do want to train these models eventually.  There were two interesting papers on the topic of training neural  networks. In the first, Quoc Le et al. show that a simple, off-the-shelf  L-BFGS optimizer is often preferable to stochastic gradient descent.<\/p>\n<ul>\n<li><a href=\"http:\/\/www.icml-2011.org\/papers\/210_icmlpaper.pdf\">On optimization methods for deep learning<\/a>, <a href=\"http:\/\/ai.stanford.edu\/%7Equocle\/\">Quoc V. Le<\/a>, <a href=\"http:\/\/ai.stanford.edu\/%7Ejngiam\/\">Jiquan Ngiam<\/a>, <a href=\"http:\/\/www.stanford.edu\/%7Eacoates\/\">Adam Coates<\/a>, <a href=\"http:\/\/sites.google.com\/site\/abhiklahiri\/\">Abhik Lahiri<\/a>, Bobby Prochnow, <a href=\"http:\/\/www.cs.stanford.edu\/people\/ang\/\">Andrew Y. Ng<\/a><\/li>\n<\/ul>\n<p>Secondly, Martens and Sutskever from Geoff Hinton&#8217;s group show how to  train recurrent neural networks for sequence tasks that exhibit very  long range dependencies:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.icml-2011.org\/papers\/532_icmlpaper.pdf\">Learning Recurrent Neural Networks with Hessian-Free Optimization<\/a>, <a href=\"http:\/\/www.cs.toronto.edu\/%7Ejmartens\/index.html\">James Martens<\/a>, <a href=\"http:\/\/www.cs.utoronto.ca\/%7Eilya\/\">Ilya Sutskever<\/a><\/li>\n<\/ul>\n<p>It will be interesting to see whether this type of training will  allow recurrent neural networks to outperform CRFs on some standard  sequence tasks and data sets. It certainly seems possible since even  with standard L-BFGS our <a href=\"http:\/\/www.icml-2011.org\/papers\/125_icmlpaper.pdf\"><strong>recursive<\/strong> neural network<\/a> (see previous post) can outperform CRF-type models on several  challenging computer vision tasks such as semantic segmentation of scene  images. This common vision task of labeling each pixel with an  object  class has not received much attention from the deep learning community.<br \/>\nApart from the vision experiments, this paper further solidifies the  trend that neural networks are being used more and more in natural  language processing. In our case, the RNN-based model was used for  structure prediction. Another neat example of this trend comes from Yann  Dauphin et al. in Yoshua Bengio&#8217;s group. They present an interesting  solution for learning with sparse bag-of-word representations.<\/p>\n<ul>\n<li><a href=\"http:\/\/www.icml-2011.org\/papers\/491_icmlpaper.pdf\">Large-Scale Learning of Embeddings with Reconstruction Sampling<\/a>, Yann Dauphin, Xavier Glorot, <a href=\"http:\/\/www.iro.umontreal.ca\/%7Ebengioy\/yoshua_en\/index.html\">Yoshua Bengio<\/a><\/li>\n<\/ul>\n<p>Such sparse representations had previously been problematic for neural architectures.<\/p>\n<p>In summary, these papers have helped us understand a bit better which  &#8220;deep&#8221; or &#8220;neural&#8221; architectures work, why they work and how we should  train them. Furthermore, the scope of problems that these architectures  can handle has been widened to harder and more real-life problems.<\/p>\n<p>Of the non-neural papers, these two papers stood out for me:<\/p>\n<ul>\n<li><a href=\"Sparse Additive Generative Models of Text\">Sparse Additive Generative Models of Text<\/a>, <a href=\"http:\/\/people.csail.mit.edu\/jacobe\/\">Jacob Eisenstein<\/a>, <a href=\"http:\/\/www.cs.cmu.edu\/~amahmed\/\">Amr Ahmed<\/a>, <a href=\"http:\/\/www.cs.cmu.edu\/~epxing\/\">Eric Xing<\/a> &#8211; the idea is to model each topic only in terms of how it differs from a background distribution.<\/li>\n<li><a href=\"http:\/\/www.icml-2011.org\/papers\/410_icmlpaper.pdf\">Message Passing Algorithms for Dirichlet Diffusion Trees<\/a>, <a href=\"http:\/\/mlg.eng.cam.ac.uk\/dave\/\">David A. Knowles<\/a>, <a href=\"http:\/\/mlg.eng.cam.ac.uk\/jurgen\/\">Jurgen Van Gael<\/a>, <a href=\"http:\/\/mlg.eng.cam.ac.uk\/zoubin\/\">Zoubin Ghahramani<\/a> &#8211; I&#8217;ve been intrigued by Dirichlet Diffusion Trees but inference always seemed too slow. This paper might solve that problem.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Maybe it&#8217;s too early to call, but with four separate Neural Network sessions at this year&#8217;s ICML, it looks like Neural Networks are making a comeback. Here are my highlights of these sessions. In general, my feeling is that these papers both demystify deep learning and show its broader applicability. The first observation I made &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/hunch.net\/?p=1852\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Interesting Neural Network Papers at ICML 2011&#8221;<\/span><\/a><\/p>\n","protected":false},"author":7060,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35,66,20,29,26,25,54,17],"tags":[],"class_list":["post-1852","post","type-post","status-publish","format-standard","hentry","category-applications","category-deep","category-language","category-machine-learning","category-structured","category-supervised","category-trees","category-vision"],"_links":{"self":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/1852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/users\/7060"}],"replies":[{"embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1852"}],"version-history":[{"count":0,"href":"https:\/\/hunch.net\/index.php?rest_route=\/wp\/v2\/posts\/1852\/revisions"}],"wp:attachment":[{"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1852"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hunch.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}