Many Machine Learning related events are coming up this fall.
- September 9, abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we:
- widened submissions to be from anybody rather than students.
- set aside a larger fraction of time for contributed submissions.
- September 15, there is a machine learning meetup, where I’ll be discussing terascale learning at AOL.
- September 16, there is a CS&Econ day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest.
- September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me:
- Big Learning is about dealing with lots of data. Abstracts are due September 30.
- The Bayes Bandits workshop. Abstracts are due September 23.
- The Personalized Medicine workshop
- The Learning Semantics workshop. Abstracts are due September 26.
- The ML Relations workshop. Abstracts are due September 30.
- The Hierarchical Learning workshop. Challenge submissions are due October 17, and abstracts are due October 21.
- The Computational Tradeoffs workshop. Abstracts are due October 17.
- The Model Selection workshop. Abstracts are due September 24.
- October 16-17 is the Singularity Summit in New York. This is for the AIists and only peripherally about ML.
- October 16-21 is a Predictive Analytics World in New York. As machine learning goes industrial, we see industrial-style conferences rapidly developing.
- October 21, there is the New York ML Symposium. In addition to what’s there, Chris Wiggins is looking into setting up a session for startups and those interested in them to get to know each other, as last year.
- Decembr 16-17 NIPS workshops in Granada, Spain.
Ron Bekkerman initiated an effort to create an edited book on parallel machine learning that Misha and I have been helping with. The breadth of efforts to parallelize machine learning surprised me: I was only aware of a small fraction initially.
This put us in a unique position, with knowledge of a wide array of different efforts, so it is natural to put together a survey tutorial on the subject of parallel learning for KDD, tomorrow. This tutorial is not limited to the book itself however, as several interesting new algorithms have come out since we started inviting chapters.
This tutorial should interest anyone trying to use machine learning on significant quantities of data, anyone interested in developing algorithms for such, and of course who has bragging rights to the fastest learning algorithm on planet earth
(Also note the Modeling with Hadoop tutorial just before ours which deals with one way of trying to speed up learning algorithms. We have almost no overlap.)
I just released Vowpal Wabbit 6.0. Since the last version:
- VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh. Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point.
- The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs.
- The new matrix factorization code from Jake adds a core algorithm.
- We finally have basic persistent daemon support, again with Jake’s help.
- Adaptive gradient calculations can now be made dimensionally correct, following up on Paul’s post, yielding a better algorithm. And Nikos sped it up further with SSE native inverse square root.
- The LDA core is perhaps twice as fast after Paul educated us about SSE and representational gymnastics.
All of the above was done without adding significant new dependencies, so the code should compile easily.
The VW mailing list has been slowly growing, and is a good place to ask questions.
Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper:
Richard Socher, Cliff Lin, Andrew Y. Ng, and Christopher D. Manning Parsing Natural Scenes and Natural Language with Recursive Neural Networks.
I invited Richard to share his list of interesting papers, so hopefully we’ll hear from him soon. In the meantime, Paul and Hal have posted some lists.
Joelle and I are program chairs for ICML 2012 in Edinburgh, which I previously enjoyed visiting in 2005. This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have:
- Colocation. The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGIR, seems to require more advanced planning. If that can be done, I believe there is substantial interest—I understand there was substantial interest in the joint symposium this year. What we did manage was achieving a colocation with COLT and there is an outside chance that a machine learning summer school will precede the main conference. The colocation with COLT is in both time and space, with COLT organized as (essentially) a separate track in a nearby building. We look forward to organizing a joint invited session or two with the COLT program chairs.
- Tutorials. We don’t have anything imaginative here, except for pushing for quality tutorials, probably through a mixture of invitations and a call. There is a small chance we’ll be able to organize a machine learning summer school as a prequel, which would be quite cool, but several things have to break right for this to occur.
- Conference. We are considering a few tinkerings with the conference format.
- Shifting a conference banquet to be during the workshops, more tightly integrating the workshops.
- Having 3 nights of posters (1 per day) rather than 2 nights. This provides more time/poster, and avoids halving talks and posters appear on different days.
- Having impromptu sessions in the evening. Two possibilities here are impromptu talks and perhaps a joint open problems session with COLT. I’ve made sure we have rooms available so others can organize other things.
- We may go for short presentations (+ a poster) for some papers, depending on how things work out schedulewise. My opinions on this are complex. ICML is traditionally multitrack with all papers having a 25 minute-ish presentation. As a mechanism for research, I believe this is superior to a single track conference of a similar size because:
- Typically some talk of potential interest can always be found by participants avoiding the boredom problem which comes up at a single track conference
- My experience is that program organizers have a limited ability to foresee which talks are of most interest, commonly creating a misallocation of attention.
On the other hand, there are clearly limits to the number of tracks that are reasonable, and I feel like ICML (especially with COLT cotimed) is near the upper limit. There are also some papers which have a limited scope of interest, for which a shorter presentation is reasonable.
- Workshops. A big change here—we want to experiment with 2 days of workshops rather than 1. There seems to be demand for it, as the number of workshops historically is about 10, enough that it’s easy to imagine people commonly interested in 2 workshops. It’s also the case that NIPS has had to start rejecting a substantial fraction of workshop submissions for space reasons. I am personally a big believer in workshops as a mechanism for further research, so I hope this works out well.
Journal integration. I tend to believe that we should be shifting to a journal format for ICML papers, as per many past discussions. After thinking about this the easiest way seems to be simply piggybacking on existing journals such as JMLR and MLJ by essentially declaring that people could submit there first, and if accepted, and not otherwise presented at a conference, present at ICML. This was considered too large a change, so it is not happening. Nevertheless, it is a possible tweak that I believe should be considered for the future. My best guess is that this would never displace the baseline conference review process, but it would help some papers that don’t naturally fit into a conference format while keeping quality high.
- Reviewing. Drawing on plentiful experience with what goes wrong, I think we can create the best reviewing system for conferences. We are still debating exact details here while working through what is possible in different conference systems. Nevertheless, some basic goals are:
- Double Blind [routine now] Two identical papers with different authors should have the same chance of success. In terms of reviewing quality, I think double blind makes little difference in the short term, but the public commitment to fair reviewing makes a real difference in the long term.
- Author Feedback [routine now] Author feedback makes a difference in only a small minority of decisions, but I believe its effect is larger as (a) reviewer quality improves and (b) reviewer understanding improves. Both of these are silent improvers of quality. Somewhat less routine, we are seeking a mechanism for authors to be able to provide feedback if additional reviews are requested, as I’ve become cautious of the late-breaking highly negative review.
- Paper Editing. Geoff Gordon tweaked AIStats this year to allow authors to revise papers during feedback. I think this is helpful, because it encourages authors to fix clarity issues immediately, rather than waiting longer. This helps with some things, but it is not a panacea—authors still have to convince reviewers their paper is worthwhile, and given the way people are first impressions are lasting impressions.
- Multisource reviewing. We want all of the initial reviews to be assigned by good yet different mechanisms. In the past, I’ve observed that the source of reviewer assignments can greatly bias the decision outcome, all the way from “accept with minor revisions” to “reject” in the case of a JMLR submission that I had. Our plan at the moment is that one review will be assigned by bidding, one by a primary area chair, and one by a secondary area chair.
- No single points of failure. When Bob Williamson and I were PC members for learning theory at NIPS, we each came to a decisions given reviews and then reconciled differences. This made a difference on about 5-10% of decisions, and (I believe) improved overall quality a bit. More generally, I’ve seen instances where an area chair has an unjustifiable dislike for a paper and kills it off, which this mechanism avoids.
- Speed. In general, I believe speed and good decision making are antagonistic. Nevertheless, we believe it is important to try to do the reviewing both quickly and well. Doing things quickly implies that we can push the submission deadline back later, providing authors more time to make quality papers. Key elements of doing things well fast are: good organization (that’s all on us), light loads for everyone involved (i.e. not too many papers), crowd sourcing (i.e. most decisions made by area chairs), and some amount of asynchrony. Altogether, we believe at the moment that two weeks can be shaved from our reviewing process.
- Website. Traditionally at ICML, every new local organizer was responsible for creating a website. This doesn’t make sense anymore, because substantial work is required there, which can and should be amortized across the years so that the website can evolve to do more for the community. We plant to create a permanent website, based around some combination of icml.cc and machinelearning.org. I think this just makes sense.
- Publishing. We are thinking about strongly encouraging authors to use arxiv for final submissions. This provides a lasting backing store for ICML papers, as well as a mechanism for revisions. The reality here is that some mistakes get into even final drafts, so a way to revise for the long term is helpful. We are also planning to videotape and make available all talks, although a decision between videolectures and Weyond has not yet been made.
Implementing all the changes above is ambitious, but I believe feasible and that each is individually beneficial and to some extent individually evaluatable. I’d like to hear any thoughts you have on this. It’s also not too late if you have further suggestions of your own.
Shravan and Alex’s LDA code is released. On a single machine, I’m not sure how it currently compares to the online LDA in VW, but the ability to effectively scale across very many machines is surely interesting.
Lev and Hal point out the CI Fellows program is on again for this year. Lev visited me for a year under this program, and I quite enjoyed it. Due May 31.
Lihong points out that ICML workshop submissions are due April 29.
Alina and Jake point out the COLT Call for Open Questions due May 11. In general, this is cool, and worth doing if you can come up with a crisp question. In my case, I particularly enjoyed crafting an open question with precisely a form such that a critic targeting my papers would be forced to confront their fallacy or make a case for the reward. But less esoterically, this is a way to get the attention of some very smart people focused on a problem that really matters, which is the real value.
I just created version 5.1 of vowpal wabbit. This almost entirely a bugfix release, so it’s an easy upgrade from v5.0.
- There is now a mailing list, which I and several other developers are subscribed to.
- The main website has shifted to the wiki on github. This means that anyone with a github account can now edit it.
- I’m planning to give a tutorial tomorrow on it at eHarmony/the LA machine learning meetup at 10am. Drop by if you’re interested.
The status of VW amongst other open source projects has changed. When VW first came out, it was relatively unique amongst existing projects in terms of features. At this point, many other projects have started to appreciate the value of the design choices here. This includes:
- Mahout, which now has an SGD implementation.
- Shogun, where Soeren is keen on incorporating features.
- LibLinear, where they won the KDD best paper award for out-of-core learning.
This is expected—any open source approach which works well should be widely adopted. None of these other projects yet have the full combination of features, so VW still offers something unique. There are also more tricks that I haven’t yet had time to implement, and I look forward to discovering even more.
I’m indebted to many people at this point who have helped with this project. I particularly point out Daniel and Nikos, who have spent quite a bit of time over the last few months working on things.
Yahoo!’s Key Scientific Challenges for Machine Learning grant applications are due March 11. If you are a student working on relevant research, please consider applying. It’s for $5K of unrestricted funding.
Machine learning always welcomes the new year with paper deadlines for summer conferences. This year, we have:
||June 28-July 2, Bellevue, Washington, USA
||Weak colocation with ACL
||July 9-July 11, Budapest, Hungary
||colocated with FOCM
||August 21-24, San Diego, California, USA
||July 14-17, Barcelona, Spain
The larger conferences are on the west coast in the United States, while the smaller ones are in Europe.
Vikas points out the Herman Goldstine Fellowship at IBM. I was a Herman Goldstine Fellow, and benefited from the experience a great deal—that’s where work on learning reductions started. If you can do research independently, it’s recommended. Applications are due January 6.
I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm.
The biggest changes are new algorithms:
- Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes.
- Nikos also implemented the per-feature learning rates as per these two papers. Often, this works better than the default algorithm. It isn’t the default because it isn’t (yet) as adaptable in terms of learning rate decay. This is enabled with –adaptive and learned regressors are compatible with the default. Computationally, you might see a factor of 4 slowdown if using ‘-q’. Nikos noticed that the phenomenal quake inverse square root hack applies making this substantially faster than a naive implementation.
- Nikos and Daniel also implemented active learning derived from this paper, usable via –active_simulation (to test parameters on an existing supervised dataset) or –active_learning (to do the real thing). This runs at full speed which is much faster than is reasonable in any active learning scenario. We see this approach dominating supervised learning on all classification datasets so far, often with far fewer labeled examples required, as the theory predicts. The learned predictor is compatible with the default.
- Olivier helped me implement preconditioned conjugate gradient based on Jonathan Shewchuk’s tutorial. This is a batch algorithm and hence requires multiple passes over any dataset to do something useful. Each step of conjugate gradient requires 2 passes. The advantage of cg is that it converges relatively quickly via the use of second derivative information. This can be particularly helpful if your features are of widely differing scales. The use of –regularization 0.001 (or smaller) is almost required with –conjugate_gradient as it will otherwise overfit hard. This implementation has two advantages over the basic approach: it implicitly computes a Hessian in O(n) time where n is the number of features and it operates out of core, hence making it applicable to datasets that don’t conveniently fit in RAM. The learned predictor is compatible with the default, although you’ll notice that a factor of 8 more RAM is required when learning.
- Matt Hoffman and I implemented Online Latent Dirichlet Allocation. This code is still experimental and likely to change over the next week. It really does a minibatch update under the hood. The code appears to be substantially faster than Matt’s earlier python implementation making this probably the most efficient LDA anywhere. LDA is still much slower than online linear learning as it is quite computationally heavy in comparison—perhaps a good candidate for GPU optimization.
- Nikos, Daniel, and I have been experimenting with more online cluster parallel learning algorithms (–corrective, –backprop, –delayed_global). We aren’t yet satisfied with these although they are improving. Details are at the LCCC workshop.
In addition, Ariel added a test suite, Shravan helped with ngrams, and there are several other minor new features and bug fixes including a very subtle one caught by Vaclav.
The documentation on the website hasn’t kept up with the code. I’m planning to rectify that over the next week, and have a new tutorial starting at 2pm in the LCCC room for those interested. Yes, I’ll not be skiing
Slashdot points out the Traffic Prediction Challenge which looks pretty fun. The temporal aspect seems to be very common in many real-world problems and somewhat understudied.
from brain cancer. I asked Misha who worked with him to write about it.
Partha Niyogi, Louis Block Professor in Computer Science and Statistics at the University of Chicago passed away on October 1, 2010, aged 43.
I first met Partha Niyogi almost exactly ten years ago when I was a graduate student in math and he had just started as a faculty in Computer Science and Statistics at the University of Chicago. Strangely, we first talked at length due to a somewhat convoluted mathematical argument in a paper on pattern recognition. I asked him some questions about the paper, and, even though the topic was new to him, he had put serious thought into it and we started regular meetings. We made significant progress and developed a line of research stemming initially just from trying to understand that one paper and to simplify one derivation. I think this was typical of Partha, showing both his intellectual curiosity and his intuition for the serendipitous; having a sense and focus for inquiries worth pursuing, no matter how remote or challenging, and bringing his unique vision to new areas. We had been working together continuously from that first meeting until he became too sick to continue. Partha had been a great adviser and a close friend for me; I am very much thankful to him for his guidance, intellectual inspiration and friendship.
Partha had a broad range of interests in research centered around the problem of learning, which had been his interest since he was an undergraduate at the Indian Institute of Technology. His research had three general themes: geometric methods in machine learning, particularly manifold methods; language evolution and language learning (he recently published a 500-page monograph on it) and speech analysis and recognition. I will not talk about his individual works, a more in-depth summary of his research is in the University of Chicago Computer Science department obituary. It is enough to say that his work has been quite influential and widely followed up. In every one of these areas he had his own approach, distinct, clear, and not afraid to challenge unexamined conventional wisdom. To lose this intellectually rigorous but open-minded vision is not just a blow to those of us who knew him and worked with him, but to the field of machine learning itself.
I owe a lot to Partha; to his insight and thoughtful attitude to research and every aspect of life. It had been a great privilege to be Partha’s student, collaborator and friend; his passing away leaves deep sadness and emptiness. It is hard to believe Partha is no longer with us, but his friendship and what I learned from him will stay with me for the rest of my life.
More from Jake, Suresh, and Lance.
Paul Mineiro has started Machined Learnings where he’s seriously attempting to do ML research in public. I personally need to read through in greater detail, as much of it is learning reduction related, trying to deal with the sorts of complex source problems that come up in practice.
On Sept 21, there is another machine learning meetup where I’ll be speaking. Although the topic is contextual bandits, I think of it as “the future of machine learning”. In particular, it’s all about how to learn in an interactive environment, such as for ad display, trading, news recommendation, etc…
On Sept 24, abstracts for the New York Machine Learning Symposium are due. This is the largest Machine Learning event in the area, so it’s a great way to have a conversation with other people.
On Oct 22, the NY ML Symposium actually happens. This year, we are expanding the spotlights, and trying to have more time for posters. In addition, we have a strong set of invited speakers: David Blei, Sanjoy Dasgupta, Tommi Jaakkola, and Yann LeCun. After the meeting, a late hackNY related event is planned where students and startups can meet.
I’d also like to point out the related CS/Econ symposium as I have interests there as well.
Geoff Gordon points out AIStats 2011 in Ft. Lauderdale, Florida. The call for papers is now out, due Nov. 1. The plan is to experiment with the review process to encourage quality in several ways. I expect to submit a paper and would encourage others with good research to do likewise.
I’ve been wanting to attend the NYC ML Meetup for some time and hope to make it next week on the 25th. Rob Schapire is talking about “Playing Repeated Games”, which in my experience is far more relevant to machine learning than the title might indicate.