|
Code Usage Tutorial Examples Todo VW4.0 post VW3.10 post VW2.3 post |
This is a project at Yahoo! Research to design a fast, scalable, useful learning algorithm. There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation. The core algorithm is specialist gradient descent (GD) on a loss function (several are available), The code should be easily usable. Its only external dependence is on the boost program_options library, which is often installed by default. FeaturesThere are several features that (in combination) are fairly interesting.
Learning RateThe code implements several methods for adjusting the learning rates. The default is a fixed learning rate which decays by a factor of 20.5 if multiple epochs are used. This seems to be a fairly stable default. For some datasets, having a learning rate which decays as 1/(number of examples)p or 1/(C + number of examples)p in stochastic gradient descent style can work better. Choosing C and the learning rate well appear to be substantially more problem dependent so this is not the default. Known good values of p are in the range [0,1], with 1 representing very aggressive decay, 0.5 representing a minimax optimal choice in an adversarial setting and smaller values implying forgetting, which can be important in time-varying settings.The FutureThis project is "live" and ongoing at github. Several people have contributed to the project, and we welcome further contributions.AuthorsShubham Chopra, Ariel Faigon, Daniel Hsu, John Langford, Lihong Li, Gordon Rios, and Alex Strehl have all worked on VW. Many others have contributed via feature requests or bug reports. |