I just created version 5.1 of vowpal wabbit. This almost entirely a bugfix release, so it’s an easy upgrade from v5.0.
In addition:
- There is now a mailing list, which I and several other developers are subscribed to.
- The main website has shifted to the wiki on github. This means that anyone with a github account can now edit it.
- I’m planning to give a tutorial tomorrow on it at eHarmony/the LA machine learning meetup at 10am. Drop by if you’re interested.
The status of VW amongst other open source projects has changed. When VW first came out, it was relatively unique amongst existing projects in terms of features. At this point, many other projects have started to appreciate the value of the design choices here. This includes:
- Mahout, which now has an SGD implementation.
- Shogun, where Soeren is keen on incorporating features.
- LibLinear, where they won the KDD best paper award for out-of-core learning.
This is expected—any open source approach which works well should be widely adopted. None of these other projects yet have the full combination of features, so VW still offers something unique. There are also more tricks that I haven’t yet had time to implement, and I look forward to discovering even more.
I’m indebted to many people at this point who have helped with this project. I particularly point out Daniel and Nikos, who have spent quite a bit of time over the last few months working on things.
I hadn’t checked in on VW in a while, but it looks like you’ve been busy, adding different loss functions, active learning, and parallelization since I last checked.
VW was an inspiration for the way I implemented both logistic regression and CRFs in LingPipe. Leon Bottou’s slides and SGD system were also a key source of inspiration. But I didn’t use either “out of core” learning or hashed features because we’ve never seen data sets large enough with customers that it’s been an issue. So far, I also haven’t generalized the loss function so we could implement SVMs.
I have generalized to mini batches, mainly because it was more efficient for computing priors than storing last-updated indices in an array and it was also more stable for very small data sets. This’ll then allow us to parallelize within mini-batches very effectively.
We have some things I don’t see in VW. For instance, regularizing predictors to unit variance and zero mean. This, of course, isn’t really possible with online training (or maybe there’s some way to approximate it). We also allow a wide range of pluggable priors including L1, L2, Cauchy, elastic net, etc. Lewis and Madigan’s BMR package was inspiring w.r.t. allowing different priors per dimension, though I really haven’t exploited our implementation of this.
I also implemented hot start so we can do things like path-based optimzation of priors by evaluating a whole range of values (most efficiently starting with low variance priors then expanding to high variance).
One thing I’d like to add that we haven’t is weighting. Mainly because I’m interested in training w.r.t. a probabilistic corpus inferred from multiple noisy annotators. (Padhraic Smyth wrote about this in the 1990s, but I haven’t seen any other refs to this concept anywhere.)
LingPipe’s also more focused on API-level integration through Java than the command line, though I’m guessing there’s a relatively clean API lurking beneath VW’s command line. For instance, our models are serializable like standard Java objects over arbitrary input/output streams. They can also operate on any kinds of objects, not just texts. Feature extraction’s encapsulated as an interface. It’s pretty inefficient, but then I’m only extracting features once (like your caching, only we do it in memory with sparse vectors).
We also have an extensive classifier eval library with everything from macro-F measure to log loss to confusion matrices — is there something similar in VW? And do you guys support any kind of cross-validation?
Thanks much for the summary.
If I understand correctly, regularizing to zero mean seems like a nonissue for reasonable-size datasets, because VW has a builtin constant feature. Regularizing to unit variance in each dimension definitely can matter. This is partly why the –adaptive and –conjugate_gradient options can be a win. Methods of regularization in VW are lacking compared to Lingpipe. Partly, this is because when you do online learning, that’s less of an issue. However, I’ve come to appreciate that even there it can be helpful.
“hot start” and weighting are definitely there in VW—you just use -i to load a saved predictor, and I agree it can be very helpful. An API-level version of VW does seem desirable, as I see things often reimplemented because of the lack of this.
There is not much in the way of eval for VW. I generally use ‘perf’, which Rich Caruana put together for a KDD cup challenge some time ago.
Cross-validation is not builtin. My experience is that large scale datasets often have a nonstationary time order, implying that you really want to be using some sort of ‘test on the future’ split.
As far as I am concerned, API-level support is *the* critical feature to add to VW.
I want to use VW a lot more, but there are many things that would take me too long to build in VW that I can easily build using another ML package with better API support. In particular, I want to build:
* Cross-validation and automatic hyperparameter tuning.
* Exploratory data analysis, i.e. better auditing of the induced classifiers.
* Better daemon support.
I would gladly push these features back to VW, if I could figure out how to program VW.