Adam Kalai points out the New England Machine Learning Day May 1 at MSR New England. There is a poster session with abstracts due April 19. I understand last year’s NEML went well and it’s great to meet your neighbors at regional workshops like this.
Michael Jordan sends the below:
The new Simons Institute for the Theory of Computing
will begin organizing semester-long programs starting in 2013.
One of our first programs, set for Fall 2013, will be on the “Theoretical Foundations
of Big Data Analysis”. The organizers of this program are Michael Jordan (chair),
Stephen Boyd, Peter Buehlmann, Ravi Kannan, Michael Mahoney, and Muthu Muthukrishnan.
See http://simons.berkeley.edu/program_bigdata2013.html for more information on
The Simons Institute has created a number of “Research Fellowships” for young
researchers (within at most six years of the award of their PhD) who wish to
participate in Institute programs, including the Big Data program. Individuals
who already hold postdoctoral positions or who are junior faculty are welcome
to apply, as are finishing PhDs.
Please note that the application deadline is January 15, 2013. Further details
are available at http://simons.berkeley.edu/fellows.html .
The New York ML symposium was last Friday. There were 303 registrations, up a bit from last year. I particularly enjoyed talks by Bill Freeman on vision and ML, Jon Lenchner on strategy in Jeopardy, and Tara N. Sainath and Brian Kingsbury on deep learning for speech recognition. If anyone has suggestions or thoughts for next year, please speak up.
I also attended Strata + Hadoop World for the first time. This is primarily a trade conference rather than an academic conference, but I found it pretty interesting as a first time attendee. This is ground zero for the Big data buzzword, and I see now why. It’s about data, and the word “big” is so ambiguous that everyone can lay claim to it. There were essentially zero academic talks. Instead, the focus was on war stories, product announcements, and education. The general level of education is much lower—explaining Machine Learning to the SQL educated is the primary operating point. Nevertheless that’s happening, and the fact that machine learning is considered a necessary technology for industry is a giant step for the field. Over time, I expect the industrial side of Machine Learning to grow, and perhaps surpass the academic side, in the same sense as has already occurred for chip design. Amongst the talks I could catch, I particularly liked the Github, Zillow, and Pandas talks. Ted Dunning also gave a particularly masterful talk, although I have doubts about the core Bayesian Bandit approach(*). The streaming k-means algorithm they implemented does look quite handy.
(*) The doubt is the following: prior elicitation is generally hard, and Bayesian techniques are not robust to misspecification. This matters in standard supervised settings, but it may matter more in exploration settings where misspecification can imply data starvation.
The main program will feature invited talks from Peter Bartlett, William Freeman, and Vladimir Vapnik, along with numerous spotlight talks and a poster session. Following the main program, hackNY and Microsoft Research are sponsoring a networking hour with talks from machine learning practitioners at NYC startups (specifically bit.ly, Buzzfeed, Chartbeat, and Sense Networks, Visual Revenue). This should be of great interest to everyone considering working in machine learning.
The New York Machine Learning Symposium is October 19 with a 2 page abstract deadline due September 13 via email with subject “Machine Learning Poster Submission” sent to firstname.lastname@example.org. Everyone is welcome to submit. Last year’s attendance was 246 and I expect more this year.
The primary experiment for ICML 2013 is multiple paper submission deadlines with rolling review cycles. The key dates are October 1, December 15, and February 15. This is an attempt to shift ICML further towards a journal style review process and reduce peak load. The “not for proceedings” experiment from this year’s ICML is not continuing.
Edit: Fixed second ICML deadline.
The workshop on the Meaningful Use of Complex Medical Data is happening again, August 9-12 in LA, near UAI on Catalina Island August 15-17. I enjoyed my visit last year, and expect this year to be interesting also.
May 16 in Cambridge, is the New England Machine Learning Day, a first regional workshop/symposium on machine learning. To present a poster, submit an abstract by May 5.
For graduate students, the Yahoo! Key Scientific Challenges program including in machine learning is on again, due March 9. The application is easy and the $5K award is high quality “no strings attached” funding. Consider submitting.
The From Data to Knowledge workshop May 7-11 at Berkeley should be of interest to the many people encountering streaming data in different disciplines. It’s run by a group of astronomers who encounter streaming data all the time. I met Josh Bloom recently and he is broadly interested in a workshop covering all aspects of Machine Learning on streaming data. The hope here is that techniques developed in one area turn out useful in another which seems quite plausible. Particularly if you are in the bay area, consider checking it out.
The New York ML symposium was last Friday. Attendance was 268, significantly larger than last year. My impression was that the event mostly still fit the space, although it was crowded. If anyone has suggestions for next year, speak up.
The best student paper award went to Sergiu Goschin for a cool video of how his system learned to play video games (I can’t find the paper online yet). Choosing amongst the submitted talks was pretty difficult this year, as there were many similarly good ones.
By coincidence all the invited talks were (at least potentially) about faster learning algorithms. Stephen Boyd talked about ADMM. Leon Bottou spoke on single pass online learning via averaged SGD. Yoav Freund talked about parameter-free hedging. In Yoav’s case the talk was mostly about a better theoretical learning algorithm, but it has the potential to unlock an exponential computational complexity improvement via oraclization of experts algorithms… but some serious thought needs to go in this direction.
Unrelated, I found quite a bit of truth in Paul’s talking bears and Xtranormal always adds a dash of funny. My impression is that the ML job market has only become hotter since 4 years ago. Anyone who is well trained can find work, with the key limiting factor being “well trained”. In this environment, efforts to make ML more automatic and more easily applied are greatly appreciated. And yes, Yahoo! is still hiring too
Everyone should have received notice for NY ML Symposium abstracts. Check carefully, as one was lost by our system.
The event itself is October 21, next week. Leon Bottou, Stephen Boyd, and Yoav Freund are giving the invited talks this year, and there are many spotlights on local work spread throughout the day. Chris Wiggins has setup 6(!) ML-interested startups to follow the symposium, which should be of substantial interest to the employment interested.
I also wanted to give an update on ICML 2012. Unlike last year, our deadline is coordinated with AIStat (which is due this Friday). The paper deadline for ICML has been pushed back to February 24 which should allow significant time for finishing up papers after the winter break. Other details may interest people as well:
- We settled on using CMT after checking out the possibilities. I wasn’t looking for this, because I’ve often found CMT clunky in terms of easy access to the right information. Nevertheless, the breadth of features and willingness to support new/better approaches to reviewing was unrivaled. We are also coordinating with Laurent, Rich, and CMT to enable their paper/reviewer recommendation system. The outcome should be a standardized interface in CMT for any recommendation system, which others can then code to if interested.
- Area chairs have been picked. The list isn’t sacred, so if we discover significant holes in expertise we’ll deal with it. We expect to start inviting PC members in a little while. Right now, we’re looking into invited talks. If you have any really good suggestions, they could be considered.
- CCC is interested in sponsoring travel costs for any climate/environment related ML papers, which seems great to us. In general, this seems like an area of growing interest.
- We now have a permanent server and the beginnings of the permanent website setup. Much more work needs to be done here.
- We haven’t settled yet on how videos will work. Last year, ICML experimented with Weyond with results here. Previously, ICML had used videolectures, which is significantly more expensive. If you have an opinion about cost/quality tradeoffs or other options, speak up.
- Plans for COLT have shifted slightly—COLT will start a day early, overlap with tutorials, then overlap with a coordinated first day of ICML conference papers.
At KDD I enjoyed Stephen Boyd’s invited talk about optimization quite a bit. However, the most interesting talk for me was David Haussler’s. His talk started out with a formidable load of biological complexity. About half-way through you start wondering, “can this be used to help with cancer?” And at the end he connects it directly to use with a call to arms for the audience: cure cancer. The core thesis here is that cancer is a complex set of diseases which can be distentangled via genetic assays, allowing attacking the specific signature of individual cancers. However, the data quantity and complex dependencies within the data require systematic and relatively automatic prediction and analysis algorithms of the kind that we are best familiar with.
Some of the papers which interested me are:
- Kai-Wei Chang and Dan Roth, Selective Block Minimization for Faster Convergence of Limited Memory Large-Scale Linear Models, which is about effectively using a hard-example cache to speedup learning.
- Leland Wilkinson, Anushka Anand, and Dang Nhon Tuan, CHIRP: A New Classifier Based on Composite Hypercubes on Iterated Random Projections. The bar on creating new classifiers is pretty high. The approach here uses a combination of random projection and partition which appears to be compelling for some nonlinear and relatively high computation settings. They do a more thorough empirical evaluation than most papers.
- Zhuang Wang, Nemanja Djuric, Koby Crammer, and Slobodan Vucetic Trading Representability for Scalability: Adaptive Multi-Hyperplane Machine for Nonlinear Classification. The paper explores an interesting idea: having lots of weight vectors (effectively infinity) associated with a particular label, showing that algorithms on this representation can deal with lots of data as per linear predictors, but with superior-to-linear performance. The authors don’t use the hashing trick, but their representation is begging for it.
- Michael Bruckner and Tobias Scheffer, Stackelberg Games for Adversarial Prediction Problem. This is about email spam filtering, where the authors use a theory of adversarial equilibria to construct a more robust filter, at least in some cases. Demonstrating this on noninteractive data is inherently difficult.
There were also three papers that were about creating (or perhaps composing) learning systems to do something cool.
- Gideon Dror, Yehuda Koren, Yoelle Maarek, and Idan Szpektor, I Want to Answer, Who Has a Question? Yahoo! Answers Recommender System. This is about how to learn to route a question to the appropriate answerer automatically.
- Yehuda Koren, Edo Liberty, Yoelle Maarek, and Roman Sandler, Automatically Tagging Email by Leveraging Other Users’ Folders. This is about helping people organize their email with machine learning.
- D. Sculley, Matthew Eric Otey, Michael Pohl, Bridget Spitznagel, John Hainsworth, Yunkai Zhou, Detecting Adversarial Advertisements in the Wild. The title is an excellent abstract here, and there are quite a few details about the implementation.
I also attended MUCMD, a workshop on the Meaningful Use of Complex Medical Data shortly afterwards. This workshop is about the emergent area of using data to improve medicine. The combination of electronic health records, the economic importance of getting medicine right, and the relatively weak use of existing data implies there is much good work to do.
This finally gave us a chance to discuss radically superior medical trial designs based on work in exploration and learning
Jeff Hammerbacher’s talk was a hilarilously blunt and well stated monologue about the need and how to gather data in a usable way.
Amongst the talks on using medical data, Suchi Saria’s seemed the most mature. They’ve constructed a noninvasive test for problem infants which is radically superior to the existing Apgar score according to leave-one-out cross validation.
From the doctor’s side, there was discussion of the deep balkanization of data systems within hospitals, efforts to overcome that, and the (un)trustworthiness of data. Many issues clearly remain here, but it also looks like serious progress is being made.
Overall, the workshop went well, with the broad cross-section of talks providing quite a bit of extra context you don’t normally see. It left me believing that a community centered on MUCMD is rising now, with attendant workshops, conferences, etc… to be expected.
Many Machine Learning related events are coming up this fall.
- September 9, abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we:
- widened submissions to be from anybody rather than students.
- set aside a larger fraction of time for contributed submissions.
- September 15, there is a machine learning meetup, where I’ll be discussing terascale learning at AOL.
- September 16, there is a CS&Econ day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest.
- September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me:
- Big Learning is about dealing with lots of data. Abstracts are due September 30.
- The Bayes Bandits workshop. Abstracts are due September 23.
- The Personalized Medicine workshop
- The Learning Semantics workshop. Abstracts are due September 26.
- The ML Relations workshop. Abstracts are due September 30.
- The Hierarchical Learning workshop. Challenge submissions are due October 17, and abstracts are due October 21.
- The Computational Tradeoffs workshop. Abstracts are due October 17.
- The Model Selection workshop. Abstracts are due September 24.
- October 16-17 is the Singularity Summit in New York. This is for the AIists and only peripherally about ML.
- October 16-21 is a Predictive Analytics World in New York. As machine learning goes industrial, we see industrial-style conferences rapidly developing.
- October 21, there is the New York ML Symposium. In addition to what’s there, Chris Wiggins is looking into setting up a session for startups and those interested in them to get to know each other, as last year.
- Decembr 16-17 NIPS workshops in Granada, Spain.
I enjoyed attending NIPS this year, with several things interesting me. For the conference itself:
- Peter Welinder, Steve Branson, Serge Belongie, and Pietro Perona, The Multidimensional Wisdom of Crowds. This paper is about using mechanical turk to get label information, with results superior to a majority vote approach.
- David McAllester, Tamir Hazan, and Joseph Keshet Direct Loss Minimization for Structured Prediction. This is about another technique for directly optimizing the loss in structured prediction, with an application to speech recognition.
- Mohammad Saberian and Nuno Vasconcelos Boosting Classifier Cascades. This is about an algorithm for simultaneously optimizing loss and computation in a classifier cascade construction. There were several other papers on cascades which are worth looking at if interested.
- Alan Fern and Prasad Tadepalli, A Computational Decision Theory for Interactive Assistants. This paper carves out some forms of natural not-MDP problems and shows their RL-style solution is tractable. It’s good to see people moving beyond MDPs, which at this point are both well understood and limited.
- Oliver Williams and Frank McSherry Probabilistic Inference and Differential Privacy. This paper is about a natural and relatively unexplored, and potentially dominating approach for achieving differential privacy and learning.
I also attended two workshops—Coarse-To-Fine and LCCC which were a fine combination. The first was about more efficient (and sometimes more effective) methods for learning which start with coarse information and refine, while the second was about parallelization and distribution of learning algorithms. Together, they were about how to learn fast and effective solutions.
The CtF workshop could have been named “Integrating breadth first search and learning”. I was somewhat (I hope not too) pesky, discussing Searn repeatedly during questions, since it seems quite plausible that a good application of Searn would compete with and plausibly improve on results from several of the talks. Eventually, I hope the conventional wisdom shifts to a belief that search and learning must be integrated for efficiency and robustness reasons. The talks in this workshop were uniformly strong in making that case. I was particularly interested in Drew’s talk on a plausible improvement on Searn.
The level of agreement in approaches at the LCCC workshop was much lower, with people discussing many radically different approaches.
- Should data be organized by feature partition or example partition? Fernando points out that features often scale sublinearly in the number of examples, implying that an example partition addresses scale better. However, basic learning theory tells us that if the number of parameters scales sublinearly in the number of examples, then the value of additional samples asymptotes, implying a mismatched solution design. My experience is that a ‘not enough features’ problem can be dealt with by throwing all the missing features you couldn’t properly previously use, for example personalization.
- How can we best leverage existing robust distributed filesystem/MapReduce frameworks? There was near unanimity on the belief that MapReduce itself is of limited value for machine learning, but the step forward is unclear. I liked what Markus said: that no one wants to abandon the ideas of robustly storing data and moving small amounts of code to large amounts of data. The best way to leverage this capability to build great algorithms remains unclear to me.
- Every speaker was in agreement that their approach was faster, but there was great disagreement about what “fast” meant in an absolute sense. This forced me to think about an absolute measure of (input complexity)/(time) where we see results between 100 features/s and 10*106 features/s being considered “fast” depending on who is speaking. This scale disparity is remarkably extreme. A related detail is that the strength of baseline algorithms varies greatly.
I hope we’ll discover convincing answers to these questions in the near future.
On Sept 21, there is another machine learning meetup where I’ll be speaking. Although the topic is contextual bandits, I think of it as “the future of machine learning”. In particular, it’s all about how to learn in an interactive environment, such as for ad display, trading, news recommendation, etc…
On Sept 24, abstracts for the New York Machine Learning Symposium are due. This is the largest Machine Learning event in the area, so it’s a great way to have a conversation with other people.
On Oct 22, the NY ML Symposium actually happens. This year, we are expanding the spotlights, and trying to have more time for posters. In addition, we have a strong set of invited speakers: David Blei, Sanjoy Dasgupta, Tommi Jaakkola, and Yann LeCun. After the meeting, a late hackNY related event is planned where students and startups can meet.
I’d also like to point out the related CS/Econ symposium as I have interests there as well.
Alekh, John, Ofer, and I are organizing a workshop at NIPS this year on learning in parallel and distributed environments. The general interest level in parallel learning seems to be growing rapidly, so I expect quite a bit of attendance. Please join us if you are parallel-interested.
And, if you are working in the area of parallel learning, please consider submitting an abstract due Oct. 17 for presentation at the workshop.
- Thomas Walsh, Kaushik Subramanian, Michael Littman and Carlos Diuk Generalizing Apprenticeship Learning across Hypothesis Classes. This paper formalizes and provides algorithms with guarantees for mixed-mode apprenticeship and traditional reinforcement learning algorithms, allowing RL algorithms that perform better than for either setting alone.
- István Szita and Csaba Szepesvári Model-based reinforcement learning with nearly tight exploration complexity bounds. This paper and anotherrepresent the frontier of best-known algorithm for Reinforcement Learning in a Markov Decision Process.
- James Martens Deep learning via Hessian-free optimization. About a new not-quite-online second order gradient algorithm for learning deep functional structures. Potentially this is very powerful because while people have often talked about end-to-end learning, it has rarely worked in practice.
- Chrisoph Sawade, Niels Landwehr, Steffen Bickel. and Tobias Scheffer Active Risk Estimation. When a test set is not known in advance, the model can be used to safely aid test set evaluation using importance weighting techniques. Relative to the paper, placing a lower bound on p(y|x) is probably important in practice.
- H. Brendan McMahan and Matthew Streeter Adaptive Bound Optimization for Online Convex Optimization and the almost-same paper John Duchi, Elad Hazan, and Yoram Singer, Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. These papers provide tractable online algorithms with regret guarantees over a family of metrics rather than just euclidean metrics. They look pretty useful in practice.
- Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella, Active Learning on Trees and Graphs Various subsets of these authors have other papers about actively learning graph-obeying functions which in total provide a good basis for understanding what’s possible and how to learn.
The program chairs for ICML did a wide-ranging survey over participants. The results seem to suggest that participants generally agree with the current ICML process. I expect there is some amount of anchoring effect going on where participants have an apparent preference for the known status quo, although it’s difficult to judge the degree of that. Some survey results which aren’t of that sort are:
- 7.7% of reviewers say author feedback changed their mind. It would be interesting to know for which fraction of accepted papers reviewers had their mind changed, but that isn’t there.
- 85.4% of authors don’t know if the reviewers read their response, believe they read and ignored it, or believe they didn’t read it. Authors clearly don’t feel like they are communicating with reviewers.
- 58.6% support growing the conference with the largest fraction suggesting poster-only papers.
- Other conferences attended by the ICML community in order are NIPS, ECML/PKDD, AAAI, IJCAI, AIStats, UAI, KDD, ICDM, COLT, SIGIR, ECAI, EMNLP, CoNLL. This is pretty different from the standard colocation list for ICML. Many possibilities are precluded by scheduling, but AAAI, IJCAI, UAI, KDD, COLT, SIGIR are all serious possibilities some of which haven’t been used much in the past.
My experience with Mark’s new paper discussion site is generally positive—having comments emailed to interested parties really helps the discussion. There are a few comments that authors haven’t responded to, so if you are an author you might want to sign up to receive comments.
In addition, I was the workshop chair for ICML&COLT this year. My overall impression was that things went reasonably well, with the exception of internet connectivity at Dan Panorama which was a minidisaster courtesy of a broken per-machine authentication system. One of the things I’m particularly happy about was the Learning to Rank Challenge workshop. I think it would be great if ICML can continue to attract new challenge workshops in the future. If anyone else has comments about the workshops, I’d love to hear them.
My general view of workshops is that they are excellent as opportunities to discuss and develop research directions—some of my best work has come from collaborations at workshops and several workshops have substantially altered my thinking about various problems. My experience running workshops is that setting them up and making them fly often appears much harder than it actually is, and the workshops often come off much better than expected in the end. Submissions are due January 18, two weeks before papers.
Similarly, Ben Taskar is looking for good tutorials, which is complementary. Workshops are about exploring a subject, while a tutorial is about distilling it down into an easily taught essence, a vital part of the research process. Tutorials are due February 13, two weeks after papers.
Perhaps the most distinct talk was by Bob Bell on various aspects of the Netflix prize competition. I also enjoyed several student posters including Matt Hoffman’s cool examples of blind source separation for music.
I’m somewhat surprised how much the workshop has grown, as it is now comparable in size to a small conference, although in style more similar to a workshop. At some point as an event grows, it becomes owned by the community rather than the organizers, so if anyone has suggestions on improving it, speak up and be heard.