Sasha is the open problems chair for both COLT and ICML. Open problems will be presented in a joint session in the evening of the COLT/ICML overlap day. COLT has a history of open sessions, but this is new for ICML. If you have a difficult theoretically definable problem in machine learning, consider submitting it for review, due March 16. You’ll benefit three ways:
- The effort of writing down a precise formulation of what you want often helps you understand the nature of the problem.
- Your problem will be officially published and citable.
- You might have it solved by some very intelligent bored people.
The general idea could easily be applied to any problem which can be crisply stated with an easily verifiable solution, and we may consider expanding this in later years, but for this year all problems need to be of a theoretical variety.
Joelle and I (and Mahdi, and Laurent) finished an initial assignment of Program Committee and Area Chairs to papers. We’ll be updating instructions for the PC and ACs as we field questions. Feel free to comment here on things of plausible general interest, but email us directly with specific concerns.
For graduate students, the Yahoo! Key Scientific Challenges program including in machine learning is on again, due March 9. The application is easy and the $5K award is high quality “no strings attached” funding. Consider submitting.
The From Data to Knowledge workshop May 7-11 at Berkeley should be of interest to the many people encountering streaming data in different disciplines. It’s run by a group of astronomers who encounter streaming data all the time. I met Josh Bloom recently and he is broadly interested in a workshop covering all aspects of Machine Learning on streaming data. The hope here is that techniques developed in one area turn out useful in another which seems quite plausible. Particularly if you are in the bay area, consider checking it out.
- The cluster parallel learning code better supports multiple simultaneous runs, and other forms of parallelism have been mostly removed. This incidentally significantly simplifies the learning core.
- The online learning algorithms are more general, with support for l1 (via a truncated gradient variant) and l2 regularization, and a generalized form of variable metric learning.
- There is a solid persistent server mode which can train online, as well as serve answers to many simultaneous queries, either in text or binary.
This should be a very good release if you are just getting started, as we’ve made it compile more automatically out of the box, have several new examples and updated documentation.
- Miro will cover the L-BFGS implementation, which he created from scratch. We have found this works quite well amongst batch learning algorithms.
- Alekh will cover how to do cluster parallel learning. If you have access to a large cluster, VW is orders of magnitude faster than any other public learning system accomplishing linear prediction. And if you are as impatient as I am, it is a real pleasure when the computers can keep up with you.
This will be recorded, so it will hopefully be available for viewing online before too long.
I hope to see you soon
Everyone should have received notice for NY ML Symposium abstracts. Check carefully, as one was lost by our system.
The event itself is October 21, next week. Leon Bottou, Stephen Boyd, and Yoav Freund are giving the invited talks this year, and there are many spotlights on local work spread throughout the day. Chris Wiggins has setup 6(!) ML-interested startups to follow the symposium, which should be of substantial interest to the employment interested.
I also wanted to give an update on ICML 2012. Unlike last year, our deadline is coordinated with AIStat (which is due this Friday). The paper deadline for ICML has been pushed back to February 24 which should allow significant time for finishing up papers after the winter break. Other details may interest people as well:
- We settled on using CMT after checking out the possibilities. I wasn’t looking for this, because I’ve often found CMT clunky in terms of easy access to the right information. Nevertheless, the breadth of features and willingness to support new/better approaches to reviewing was unrivaled. We are also coordinating with Laurent, Rich, and CMT to enable their paper/reviewer recommendation system. The outcome should be a standardized interface in CMT for any recommendation system, which others can then code to if interested.
- Area chairs have been picked. The list isn’t sacred, so if we discover significant holes in expertise we’ll deal with it. We expect to start inviting PC members in a little while. Right now, we’re looking into invited talks. If you have any really good suggestions, they could be considered.
- CCC is interested in sponsoring travel costs for any climate/environment related ML papers, which seems great to us. In general, this seems like an area of growing interest.
- We now have a permanent server and the beginnings of the permanent website setup. Much more work needs to be done here.
- We haven’t settled yet on how videos will work. Last year, ICML experimented with Weyond with results here. Previously, ICML had used videolectures, which is significantly more expensive. If you have an opinion about cost/quality tradeoffs or other options, speak up.
- Plans for COLT have shifted slightly—COLT will start a day early, overlap with tutorials, then overlap with a coordinated first day of ICML conference papers.
Various people want to use hunch.net to announce things. I’ve generally resisted this because I feared hunch becoming a pure announcement zone while I am much more interested contentful posts and discussion personally. Nevertheless there is clearly some value and announcements are easy, so I’m planning to summarize announcements on Mondays.
- D. Sculley points out an interesting Semisupervised feature learning competition, with a deadline of October 17.
- Lihong Li points out the webscope user interaction dataset which is the first high quality exploration dataset I’m aware of that is publicly available.
- Seth Rogers points out CrossValidated which looks similar in conception to metaoptimize, but directly using the stackoverflow interface and with a bit more of a statistics twist.
Many Machine Learning related events are coming up this fall.
- September 9, abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we:
- widened submissions to be from anybody rather than students.
- set aside a larger fraction of time for contributed submissions.
- September 15, there is a machine learning meetup, where I’ll be discussing terascale learning at AOL.
- September 16, there is a CS&Econ day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest.
- September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me:
- Big Learning is about dealing with lots of data. Abstracts are due September 30.
- The Bayes Bandits workshop. Abstracts are due September 23.
- The Personalized Medicine workshop
- The Learning Semantics workshop. Abstracts are due September 26.
- The ML Relations workshop. Abstracts are due September 30.
- The Hierarchical Learning workshop. Challenge submissions are due October 17, and abstracts are due October 21.
- The Computational Tradeoffs workshop. Abstracts are due October 17.
- The Model Selection workshop. Abstracts are due September 24.
- October 16-17 is the Singularity Summit in New York. This is for the AIists and only peripherally about ML.
- October 16-21 is a Predictive Analytics World in New York. As machine learning goes industrial, we see industrial-style conferences rapidly developing.
- October 21, there is the New York ML Symposium. In addition to what’s there, Chris Wiggins is looking into setting up a session for startups and those interested in them to get to know each other, as last year.
- Decembr 16-17 NIPS workshops in Granada, Spain.
Ron Bekkerman initiated an effort to create an edited book on parallel machine learning that Misha and I have been helping with. The breadth of efforts to parallelize machine learning surprised me: I was only aware of a small fraction initially.
This put us in a unique position, with knowledge of a wide array of different efforts, so it is natural to put together a survey tutorial on the subject of parallel learning for KDD, tomorrow. This tutorial is not limited to the book itself however, as several interesting new algorithms have come out since we started inviting chapters.
This tutorial should interest anyone trying to use machine learning on significant quantities of data, anyone interested in developing algorithms for such, and of course who has bragging rights to the fastest learning algorithm on planet earth
(Also note the Modeling with Hadoop tutorial just before ours which deals with one way of trying to speed up learning algorithms. We have almost no overlap.)
I just released Vowpal Wabbit 6.0. Since the last version:
- VW is now 2-3 orders of magnitude faster at linear learning, primarily thanks to Alekh. Given the baseline, this is loads of fun, allowing us to easily deal with terafeature datasets, and dwarfing the scale of any other open source projects. The core improvement here comes from effective parallelization over kilonode clusters (either Hadoop or not). This code is highly scalable, so it even helps with clusters of size 2 (and doesn’t hurt for clusters of size 1). The core allreduce technique appears widely and easily reused—we’ve already used it to parallelize Conjugate Gradient, LBFGS, and two variants of online learning. We’ll be documenting how to do this more thoroughly, but for now “README_cluster” and associated scripts should provide a good starting point.
- The new LBFGS code from Miro seems to commonly dominate the existing conjugate gradient code in time/quality tradeoffs.
- The new matrix factorization code from Jake adds a core algorithm.
- We finally have basic persistent daemon support, again with Jake’s help.
- Adaptive gradient calculations can now be made dimensionally correct, following up on Paul’s post, yielding a better algorithm. And Nikos sped it up further with SSE native inverse square root.
- The LDA core is perhaps twice as fast after Paul educated us about SSE and representational gymnastics.
All of the above was done without adding significant new dependencies, so the code should compile easily.
The VW mailing list has been slowly growing, and is a good place to ask questions.
Unfortunately, I ended up sick for much of this ICML. I did manage to catch one interesting paper:
Joelle and I are program chairs for ICML 2012 in Edinburgh, which I previously enjoyed visiting in 2005. This is a huge responsibility, that we hope to accomplish well. A part of this (perhaps the most fun part), is imagining how we can make ICML better. A key and critical constraint is choosing things that can be accomplished. So far we have:
- Colocation. The first thing we looked into was potential colocations. We quickly discovered that many other conferences precomitted their location. For the future, getting a colocation with ACL or SIGIR, seems to require more advanced planning. If that can be done, I believe there is substantial interest—I understand there was substantial interest in the joint symposium this year. What we did manage was achieving a colocation with COLT and there is an outside chance that a machine learning summer school will precede the main conference. The colocation with COLT is in both time and space, with COLT organized as (essentially) a separate track in a nearby building. We look forward to organizing a joint invited session or two with the COLT program chairs.
- Tutorials. We don’t have anything imaginative here, except for pushing for quality tutorials, probably through a mixture of invitations and a call. There is a small chance we’ll be able to organize a machine learning summer school as a prequel, which would be quite cool, but several things have to break right for this to occur.
- Conference. We are considering a few tinkerings with the conference format.
- Shifting a conference banquet to be during the workshops, more tightly integrating the workshops.
- Having 3 nights of posters (1 per day) rather than 2 nights. This provides more time/poster, and avoids halving talks and posters appear on different days.
- Having impromptu sessions in the evening. Two possibilities here are impromptu talks and perhaps a joint open problems session with COLT. I’ve made sure we have rooms available so others can organize other things.
- We may go for short presentations (+ a poster) for some papers, depending on how things work out schedulewise. My opinions on this are complex. ICML is traditionally multitrack with all papers having a 25 minute-ish presentation. As a mechanism for research, I believe this is superior to a single track conference of a similar size because:
- Typically some talk of potential interest can always be found by participants avoiding the boredom problem which comes up at a single track conference
- My experience is that program organizers have a limited ability to foresee which talks are of most interest, commonly creating a misallocation of attention.
On the other hand, there are clearly limits to the number of tracks that are reasonable, and I feel like ICML (especially with COLT cotimed) is near the upper limit. There are also some papers which have a limited scope of interest, for which a shorter presentation is reasonable.
- Workshops. A big change here—we want to experiment with 2 days of workshops rather than 1. There seems to be demand for it, as the number of workshops historically is about 10, enough that it’s easy to imagine people commonly interested in 2 workshops. It’s also the case that NIPS has had to start rejecting a substantial fraction of workshop submissions for space reasons. I am personally a big believer in workshops as a mechanism for further research, so I hope this works out well.
Journal integration. I tend to believe that we should be shifting to a journal format for ICML papers, as per many past discussions. After thinking about this the easiest way seems to be simply piggybacking on existing journals such as JMLR and MLJ by essentially declaring that people could submit there first, and if accepted, and not otherwise presented at a conference, present at ICML. This was considered too large a change, so it is not happening. Nevertheless, it is a possible tweak that I believe should be considered for the future. My best guess is that this would never displace the baseline conference review process, but it would help some papers that don’t naturally fit into a conference format while keeping quality high.
- Reviewing. Drawing on plentiful experience with what goes wrong, I think we can create the best reviewing system for conferences. We are still debating exact details here while working through what is possible in different conference systems. Nevertheless, some basic goals are:
- Double Blind [routine now] Two identical papers with different authors should have the same chance of success. In terms of reviewing quality, I think double blind makes little difference in the short term, but the public commitment to fair reviewing makes a real difference in the long term.
- Author Feedback [routine now] Author feedback makes a difference in only a small minority of decisions, but I believe its effect is larger as (a) reviewer quality improves and (b) reviewer understanding improves. Both of these are silent improvers of quality. Somewhat less routine, we are seeking a mechanism for authors to be able to provide feedback if additional reviews are requested, as I’ve become cautious of the late-breaking highly negative review.
- Paper Editing. Geoff Gordon tweaked AIStats this year to allow authors to revise papers during feedback. I think this is helpful, because it encourages authors to fix clarity issues immediately, rather than waiting longer. This helps with some things, but it is not a panacea—authors still have to convince reviewers their paper is worthwhile, and given the way people are first impressions are lasting impressions.
- Multisource reviewing. We want all of the initial reviews to be assigned by good yet different mechanisms. In the past, I’ve observed that the source of reviewer assignments can greatly bias the decision outcome, all the way from “accept with minor revisions” to “reject” in the case of a JMLR submission that I had. Our plan at the moment is that one review will be assigned by bidding, one by a primary area chair, and one by a secondary area chair.
- No single points of failure. When Bob Williamson and I were PC members for learning theory at NIPS, we each came to a decisions given reviews and then reconciled differences. This made a difference on about 5-10% of decisions, and (I believe) improved overall quality a bit. More generally, I’ve seen instances where an area chair has an unjustifiable dislike for a paper and kills it off, which this mechanism avoids.
- Speed. In general, I believe speed and good decision making are antagonistic. Nevertheless, we believe it is important to try to do the reviewing both quickly and well. Doing things quickly implies that we can push the submission deadline back later, providing authors more time to make quality papers. Key elements of doing things well fast are: good organization (that’s all on us), light loads for everyone involved (i.e. not too many papers), crowd sourcing (i.e. most decisions made by area chairs), and some amount of asynchrony. Altogether, we believe at the moment that two weeks can be shaved from our reviewing process.
- Website. Traditionally at ICML, every new local organizer was responsible for creating a website. This doesn’t make sense anymore, because substantial work is required there, which can and should be amortized across the years so that the website can evolve to do more for the community. We plant to create a permanent website, based around some combination of icml.cc and machinelearning.org. I think this just makes sense.
- Publishing. We are thinking about strongly encouraging authors to use arxiv for final submissions. This provides a lasting backing store for ICML papers, as well as a mechanism for revisions. The reality here is that some mistakes get into even final drafts, so a way to revise for the long term is helpful. We are also planning to videotape and make available all talks, although a decision between videolectures and Weyond has not yet been made.
Implementing all the changes above is ambitious, but I believe feasible and that each is individually beneficial and to some extent individually evaluatable. I’d like to hear any thoughts you have on this. It’s also not too late if you have further suggestions of your own.
Shravan and Alex‘s LDA code is released. On a single machine, I’m not sure how it currently compares to the online LDA in VW, but the ability to effectively scale across very many machines is surely interesting.
Alina and Jake point out the COLT Call for Open Questions due May 11. In general, this is cool, and worth doing if you can come up with a crisp question. In my case, I particularly enjoyed crafting an open question with precisely a form such that a critic targeting my papers would be forced to confront their fallacy or make a case for the reward. But less esoterically, this is a way to get the attention of some very smart people focused on a problem that really matters, which is the real value.
- There is now a mailing list, which I and several other developers are subscribed to.
- The main website has shifted to the wiki on github. This means that anyone with a github account can now edit it.
- I’m planning to give a tutorial tomorrow on it at eHarmony/the LA machine learning meetup at 10am. Drop by if you’re interested.
The status of VW amongst other open source projects has changed. When VW first came out, it was relatively unique amongst existing projects in terms of features. At this point, many other projects have started to appreciate the value of the design choices here. This includes:
- Mahout, which now has an SGD implementation.
- Shogun, where Soeren is keen on incorporating features.
- LibLinear, where they won the KDD best paper award for out-of-core learning.
This is expected—any open source approach which works well should be widely adopted. None of these other projects yet have the full combination of features, so VW still offers something unique. There are also more tricks that I haven’t yet had time to implement, and I look forward to discovering even more.
Yahoo!’s Key Scientific Challenges for Machine Learning grant applications are due March 11. If you are a student working on relevant research, please consider applying. It’s for $5K of unrestricted funding.
Machine learning always welcomes the new year with paper deadlines for summer conferences. This year, we have:
|Conference||Paper Deadline||When/Where||Double blind?||Author Feedback?||Notes|
|ICML||February 1||June 28-July 2, Bellevue, Washington, USA||Y||Y||Weak colocation with ACL|
|COLT||February 11||July 9-July 11, Budapest, Hungary||N||N||colocated with FOCM|
|KDD||February 11/18||August 21-24, San Diego, California, USA||N||N|
|UAI||March 18||July 14-17, Barcelona, Spain||Y||N|
The larger conferences are on the west coast in the United States, while the smaller ones are in Europe.
Vikas points out the Herman Goldstine Fellowship at IBM. I was a Herman Goldstine Fellow, and benefited from the experience a great deal—that’s where work on learning reductions started. If you can do research independently, it’s recommended. Applications are due January 6.
I’ve released version 5.0 of the Vowpal Wabbit online learning software. The major number has changed since the last release because I regard all earlier versions as obsolete—there are several new algorithms & features including substantial changes and upgrades to the default learning algorithm.
The biggest changes are new algorithms:
- Nikos and I improved the default algorithm. The basic update rule still uses gradient descent, but the size of the update is carefully controlled so that it’s impossible to overrun the label. In addition, the normalization has changed. Computationally, these changes are virtually free and yield better results, sometimes much better. Less careful updates can be reenabled with –loss_function classic, although results are still not identical to previous due to normalization changes.
- Nikos also implemented the per-feature learning rates as per these two papers. Often, this works better than the default algorithm. It isn’t the default because it isn’t (yet) as adaptable in terms of learning rate decay. This is enabled with –adaptive and learned regressors are compatible with the default. Computationally, you might see a factor of 4 slowdown if using ‘-q’. Nikos noticed that the phenomenal quake inverse square root hack applies making this substantially faster than a naive implementation.
- Nikos and Daniel also implemented active learning derived from this paper, usable via –active_simulation (to test parameters on an existing supervised dataset) or –active_learning (to do the real thing). This runs at full speed which is much faster than is reasonable in any active learning scenario. We see this approach dominating supervised learning on all classification datasets so far, often with far fewer labeled examples required, as the theory predicts. The learned predictor is compatible with the default.
- Olivier helped me implement preconditioned conjugate gradient based on Jonathan Shewchuk‘s tutorial. This is a batch algorithm and hence requires multiple passes over any dataset to do something useful. Each step of conjugate gradient requires 2 passes. The advantage of cg is that it converges relatively quickly via the use of second derivative information. This can be particularly helpful if your features are of widely differing scales. The use of –regularization 0.001 (or smaller) is almost required with –conjugate_gradient as it will otherwise overfit hard. This implementation has two advantages over the basic approach: it implicitly computes a Hessian in O(n) time where n is the number of features and it operates out of core, hence making it applicable to datasets that don’t conveniently fit in RAM. The learned predictor is compatible with the default, although you’ll notice that a factor of 8 more RAM is required when learning.
- Matt Hoffman and I implemented Online Latent Dirichlet Allocation. This code is still experimental and likely to change over the next week. It really does a minibatch update under the hood. The code appears to be substantially faster than Matt’s earlier python implementation making this probably the most efficient LDA anywhere. LDA is still much slower than online linear learning as it is quite computationally heavy in comparison—perhaps a good candidate for GPU optimization.
- Nikos, Daniel, and I have been experimenting with more online cluster parallel learning algorithms (–corrective, –backprop, –delayed_global). We aren’t yet satisfied with these although they are improving. Details are at the LCCC workshop.
The documentation on the website hasn’t kept up with the code. I’m planning to rectify that over the next week, and have a new tutorial starting at 2pm in the LCCC room for those interested. Yes, I’ll not be skiing