Usage information for vw

The program "vw" implements all the algorithms, depending on flags.

 5:20PM humpty-112: vw --help                                                                                                                                                              ~/programs/vowpal_wabbit [jl/ttypts/5]

VW options:
  -a [ --audit ]                         print weights of features
  -b [ --bit_precision ] arg (=18)       number of bits in the feature table
  -c [ --cache ]                         Use a cache.  The default is
                                         .cache
  --cache_file arg                       The location of a cache_file.
  -d [ --data ] arg                      Example Set
  --daemon                               read data from port 39523
  --decay_learning_rate arg (=0.7071068) Set Decay factor for learning_rate
                                         between passes
  -f [ --final_regressor ] arg           Final regressor
  -h [ --help ]                          Output Arguments
  -i [ --initial_regressor ] arg         Initial regressor
  --initial_t arg (=1)                   initial t value
  --min_prediction arg (=0)              Smallest prediction to output
  --max_prediction arg (=1)              Largest prediction to output
  --multisource arg                      multiple sources for daemon input
  --noop                                 do no learning
  --port arg                             port to listen on
  --power_t arg (=0)                     t power value
  -l [ --learning_rate ] arg (=0.1)      Set Learning Rate
  --passes arg (=1)                      Number of Training Passes
  -p [ --predictions ] arg               File to output predictions to
  --predictto arg                        host to send predictions to
  -q [ --quadratic ] arg                 Create and use quadratic features
  --quiet                                Don't output diagnostics
  -r [ --raw_predictions ] arg           File to output unnormalized
                                         predictions to
  --sendto arg                           send example to <hosts>
  -s [ --summer ] arg                    host to use as a summer
  -t [ --testonly ]                      Ignore label information and just test
  --thread_bits arg (=0)                 log_2 threads
  --loss_function arg (=squared)         Specify the loss function to be used,
                                         uses squared loss by default. Currently
                                         available ones are: squared,
                                         hinge, logistic and quantile.
  --quantiles_tau arg (=0)               Parameter \tau associated with
                                         Quantiles loss. Unless mentioned this
                                         parameter would default to a value of
                                         0.0
  --unique_id arg (=0)                   unique id used for cluster parallel

Here's an explanation of the useful flags.

-a [ --audit] Use to learn what the hashed values of the feature indicies and the weights of features are.
-b [ --bit_precision ] arg The internal representation of the learning algorithm is a large array of floats which are indexed by hashing the feature value. This flag controls log₂ of the array size. If you want no collisions, then you need the 2*log(number of features) by the birthday paradox. On very large datasets where we can't easily represent all the features, we found this mechanism for sparsity to be more effective than the sparsification technique used in version 1. Note that your speed may be highly dependent on this parameter---if the weight vector fits in the l2 cache, you can be extremely efficient.
-c [ --cache ] Whether or not to use a cache. For linear representations, this typically results in an order of magnitude speedup. The cache file contents depend on -b, and this dependence is autochecked. If a valid cache is not found, the program starts creating one.
--cache_file arg The location of the cache file. By default it is data_file.cache.
-d [ --data ] arg The training or testing file. See below for the format. The "-d" flag isn't necessary, because an unflagged argument is the datafile by default.
--daemon Listen for data at port 39523 instead of reading from a file.
--decay_learning_rate arg The learning rate is multiplied by this quantity after every pass over the data.
-f [ --final_regressor ] arg Which file to output the final regressor into.
-h [ --help ] Output the set of flags. Using no arguments has the same effect.
-i [ --initial_regressor ] arg Start by loading an initial regressor. The regressor file contains -b, -s, and -q flag arguments used when producing the regressor and will overrule any that you try to give.
--initial_t arg (=1) An offset to the initial count. This only impacts learning if the learning rate decays with t.
--min_prediction arg (=0) By default, VW clips all predictions less than 0 to 0. You can choose a different boundary with this flag.
--max_prediction arg (=1) By default, VW clips all predictions greater than 1 to 1. You can choose a different boundary with this flag.
--multisource arg This is for cluster parallelism. You specify the number of incoming --predictto connections.
--port arg Specify a port for the daemon to listen on.
--power_t arg (=0) The power on 1/(initial_t + t) which controls the learning rate.
--passes arg The number of times the learning algorithm passes over the data. We found that decaying the learning rate by a factor of 1/2^0.5 was effective, so this is the default. You can change the decay rate via --decay_learning_rate or create your own multipass algorithm via use of --initial_regressor.
-p [ --predictions ] arg File to output predictions to. This can be used during either training or testing. Note that if order matters then you should set --threads 1.
--predictto argThis is for clusterparallelism. You specify where "hostname:port" to send the prediction to.
-q [ --quadratic ] arg Whether or not and which quadratic features to create. The argument is two characters---the first character of two namespaces which are created.
--quiet This turns off all the informative printouts.
-r [ --raw_predictions ] arg A file to output raw (unnormalized) prediction to. This is sometimes helpful if you are using the score for ordering rather than probabilistic prediction.
--sendto arg This specifies where to send examples to. If multiple --sendto's are used, this implements feature sharding, breaking up features by destination.
-t [ --testonly ] Ignore any available label information and don't train. You probably want to use -p and --threads 1 also.
--thread_bits arg log₂ of the number of threads to use in the core. This option is typically useless, unless you are using -q.

Data file format

The training set is a line-by-line format of the form <label> <weight> <tag>|<namespace> <feature> <feature> ... |<namespace> <feature> <feature> ...

The semantics is: features with the same name are different features in different namespaces.

If you want to specify a value for a feature, you do this by adding :<float> to the namespace (for all features in the namespace) or the feature. For example "|txt:-1 foo bar baz" would say that the features "foo", "bar", and "baz" each have value -1 (rather then the default of 1). The <tag> is a string not containing a special character which is echoed on output of any predictions.

If you don't specify a label, the learning algorithm doesn't try to learn (but it does test).

If you don't specify a weight, it defaults to 1.