Vowpal Wabbit version 4.0, and a NIPS heresy

I’m releasing version 4.0(tarball) of Vowpal Wabbit. The biggest change (by far) in this release is experimental support for cluster parallelism, with notable help from Daniel Hsu.

I also took advantage of the major version number to introduce some incompatible changes, including switching to murmurhash 2, and other alterations to cachefiles. You’ll need to delete and regenerate them. In addition, the precise specification for a “tag” (i.e. string that can be used to identify an example) changed—you can’t have a space between the tag and the ‘|’ at the beginning of the feature namespace.

And, of course, we made it faster.

For the future, I put up my todo list outlining the major future improvements I want to see in the code. I’m planning to discuss the current mechanism and results of the cluster parallel implementation at the large scale machine learning workshop at NIPS later this week. Several people have asked me to do a tutorial/walkthrough of VW, which is arranged for friday 2pm in the workshop room—no skiing for me Friday. Come join us if this heresy interests you as well 🙂

9 Replies to “Vowpal Wabbit version 4.0, and a NIPS heresy”

  1. Hi John,

    Just wondering if there are any recommended settings for using VW on large but highly unbalanced datasets (few positives, many negatives). Is there a recommended schedule/ordering of examples (avoiding long stretches of examples of a single class) to get the best performance?


    1. If possible, predicting low probability events should be avoided by defining a different problem to solve. For example, sometimes you can predict which of two choices instead.

      If it can’t be avoided, log-loss is probably preferred to the default squared loss, as it’s much more sensitive to events with low conditional probability.

      You’ll probably have to turn down the learning rate substantially.

      You probably want to order the examples in either random order or time order, The all 1’s before all 0’s order should be avoided at all costs as any reasonable online algorithm will produce a nonsense solution.

      1. my experience:

        subsampling the negatives and/or importance-weighting them has given me slight improvements (interesting that throwing away data can help); you have to scale the resulting estimator.

        interleaving the data so that there is typically one positive per N negatives has helped some as well. basically I made two files (positive and negative examples), permuted each file individually, then interleaved them randomly with rates 1:N … there were a few negatives left over at the end that I just stuck on the end of the file.

        — p

      2. Thanks for the reply John and Paul. I didn’t particularly understand “For example, sometimes you can predict which of two choices instead.”. What type of learning problem would this be called. Any paper/reference that I could look into? Thanks again ..

  2. Has the NIPS tutorial been recorded? I’ve been looking for introductory material on VW, but couldn’t find much — what is out there assumes a lot of knowledge about ML.

    I’m looking at ranking problems and techniques similar to the ones described in http://olivier.chapelle.cc/pub/ssi-ir.pdf. They use a margin ranking loss function to learn a matrix of weights (see page 8). I’ve been wondering if VW infrastructure can be used for that — but from what I understand, VW is designed to work in a classification or regression setting, learning *one* output variable — is that correct? Would it be useful in the case described above?

Comments are closed.