We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li, Alex Strehl, and I have been working on.
To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are:
- IBM’s Parallel Machine Learning Toolbox isn’t quite open source. The approach used by this toolbox is essentially map-reduce style computation, which doesn’t seem amenable to online learning approaches. This is significant, because the fastest learning algorithms without parallelization tend to be online learning algorithms.
- Leon Bottou‘s sgd implementation first loads data into RAM, then learns. Leon’s code is a great demonstrator of how fast and effective online learning approaches (specifically stochastic gradient descent) can be. VW is about a factor of 3 faster on my desktop, and yields a lower error rate solution.
There are several other features such as feature pairing, sparse features, and namespacing that are often handy in practice.
At present, VW optimizes squared loss via gradient descent or exponentiated gradient descent over a linear representation.
This code is free to use, incorporate, and modify as per the BSD (revised) license. The project is ongoing inside of Yahoo. We will gladly incorporate significant improvements from other people, and I believe any significant improvements are of substantial research interest.
Very Cool. Congrats on the release.
Thanks for posting this. I’ve had a lot of fun converting it into Java.
Hi, I’m interested in Java implementation. Is it available?
bug? parse_regressor.cc line 98 and line 111, I think “if (regressor.good())” isn’t needed. My interpretation is that the last weight of the file is being ignored.
Hmm, never mind on that bit about .good(), I was wrong.
If you do find any bugs, I’m of course quite interested.
I’m also interested in any comparative timings you have between the java and C++ code. We chose C++ because we thought it was necessary for speed, but some people claim otherwise.
Neato. One other toolkit that seems related is VFML, by Geoff Hulten and Pedro Domingos. It’s a set of online algorithms for learning decision trees, Bayesian networks, and clustering, along with an API for implementing more algorithms.
quite cool!
Thanks for sharing! Are there any higher-level bindings — eg. for python?
I too would be interested in Python bindings.
It would be great to have this—it just needs doing.
Any news? Or do I have to do it myself 😉
Cheers
Congrats! I hadn’t seen a classifier with such a good performance/speed ratio since a long time.
I get big performance difference by changing the –initial_t and –power_t parameters. Could you give a short tutorial on how to choose them?
Would it be possible to perform structured learning with it? I have played with MIRA lately and I was wondering if the same ideas would apply.
The answer is certainly “yes”, but it requires programming. Hal Daume and I have seriously discussed implementing Searn, providing perhaps providing a factor of 100-1000 speedup over his current implementation. This is particularly compelling, because Searn is already substantially faster than CRF-style structured prediction.