Vowpal Wabbit Code Release

We are releasing the Vowpal Wabbit (Fast Online Learning) code as open source under a BSD (revised) license. This is a project at Yahoo! Research to build a useful large scale learning algorithm which Lihong Li, Alex Strehl, and I have been working on.

To appreciate the meaning of “large”, it’s useful to define “small” and “medium”. A “small” supervised learning problem is one where a human could use a labeled dataset and come up with a reasonable predictor. A “medium” supervised learning problem dataset fits into the RAM of a modern desktop computer. A “large” supervised learning problem is one which does not fit into the RAM of a normal machine. VW tackles large scale learning problems by this definition of large. I’m not aware of any other open source Machine Learning tools which can handle this scale (although they may exist). A few close ones are:

IBM’s Parallel Machine Learning Toolbox isn’t quite open source. The approach used by this toolbox is essentially map-reduce style computation, which doesn’t seem amenable to online learning approaches. This is significant, because the fastest learning algorithms without parallelization tend to be online learning algorithms.
Leon Bottou‘s sgd implementation first loads data into RAM, then learns. Leon’s code is a great demonstrator of how fast and effective online learning approaches (specifically stochastic gradient descent) can be. VW is about a factor of 3 faster on my desktop, and yields a lower error rate solution.

There are several other features such as feature pairing, sparse features, and namespacing that are often handy in practice.

At present, VW optimizes squared loss via gradient descent or exponentiated gradient descent over a linear representation.

This code is free to use, incorporate, and modify as per the BSD (revised) license. The project is ongoing inside of Yahoo. We will gladly incorporate significant improvements from other people, and I believe any significant improvements are of substantial research interest.

16 Replies to “Vowpal Wabbit Code Release”

Nima Negahban says:

12/31/2007 at 9:49 am

Very Cool. Congrats on the release.
Pingback: Interesting Machine Learning Project | Sodalites - Thought To Everyone
Ron says:

1/3/2008 at 11:59 am

Thanks for posting this. I’ve had a lot of fun converting it into Java.
1. Joseph says:
  
  10/12/2012 at 9:25 pm
  
  Hi, I’m interested in Java implementation. Is it available?
Ron says:

1/3/2008 at 2:33 pm

bug? parse_regressor.cc line 98 and line 111, I think “if (regressor.good())” isn’t needed. My interpretation is that the last weight of the file is being ignored.
Ron says:

1/3/2008 at 2:34 pm

Hmm, never mind on that bit about .good(), I was wrong.
jl says:

1/3/2008 at 2:44 pm

If you do find any bugs, I’m of course quite interested.

I’m also interested in any comparative timings you have between the java and C++ code. We chose C++ because we thought it was necessary for speed, but some people claim otherwise.
Daniel Lowd says:

1/9/2008 at 4:15 am

Neato. One other toolkit that seems related is VFML, by Geoff Hulten and Pedro Domingos. It’s a set of online algorithms for learning decision trees, Bayesian networks, and clustering, along with an API for implementing more algorithms.
laowuz says:

5/4/2008 at 6:53 am

quite cool!
Matt says:

6/25/2009 at 2:15 pm

Thanks for sharing! Are there any higher-level bindings — eg. for python?
1. Joseph Turian says:
  
  8/10/2010 at 5:51 pm
  
  I too would be interested in Python bindings.
  1. jl says:
    
    8/21/2010 at 8:21 pm
    
    It would be great to have this—it just needs doing.
2. Andreas Mueller says:
  
  11/8/2010 at 7:55 am
  
  Any news? Or do I have to do it myself 😉
  Cheers
Benoit says:

7/31/2009 at 5:59 pm

Congrats! I hadn’t seen a classifier with such a good performance/speed ratio since a long time.

I get big performance difference by changing the –initial_t and –power_t parameters. Could you give a short tutorial on how to choose them?

Would it be possible to perform structured learning with it? I have played with MIRA lately and I was wondering if the same ideas would apply.
1. jl says:
  
  8/1/2009 at 9:44 am
  
  The answer is certainly “yes”, but it requires programming. Hal Daume and I have seriously discussed implementing Searn, providing perhaps providing a factor of 100-1000 speedup over his current implementation. This is particularly compelling, because Searn is already substantially faster than CRF-style structured prediction.
Pingback: Hashing Language | Some Ben?

Comments are closed.