Usage information for vw
The program "vw" implements all the algorithms, depending on flags.
5:20PM humpty-112: vw --help ~/programs/vowpal_wabbit [jl/ttypts/5]
VW options:
-a [ --audit ] print weights of features
-b [ --bit_precision ] arg (=18) number of bits in the feature table
-c [ --cache ] Use a cache. The default is
.cache
--cache_file arg The location of a cache_file.
-d [ --data ] arg Example Set
--daemon read data from port 39523
--decay_learning_rate arg (=0.7071068) Set Decay factor for learning_rate
between passes
-f [ --final_regressor ] arg Final regressor
-h [ --help ] Output Arguments
-i [ --initial_regressor ] arg Initial regressor
--initial_t arg (=1) initial t value
--min_prediction arg (=0) Smallest prediction to output
--max_prediction arg (=1) Largest prediction to output
--multisource arg multiple sources for daemon input
--noop do no learning
--port arg port to listen on
--power_t arg (=0) t power value
-l [ --learning_rate ] arg (=0.1) Set Learning Rate
--passes arg (=1) Number of Training Passes
-p [ --predictions ] arg File to output predictions to
--predictto arg host to send predictions to
-q [ --quadratic ] arg Create and use quadratic features
--quiet Don't output diagnostics
-r [ --raw_predictions ] arg File to output unnormalized
predictions to
--sendto arg send example to <hosts>
-s [ --summer ] arg host to use as a summer
-t [ --testonly ] Ignore label information and just test
--thread_bits arg (=0) log_2 threads
--loss_function arg (=squared) Specify the loss function to be used,
uses squared loss by default. Currently
available ones are: squared,
hinge, logistic and quantile.
--quantiles_tau arg (=0) Parameter \tau associated with
Quantiles loss. Unless mentioned this
parameter would default to a value of
0.0
--unique_id arg (=0) unique id used for cluster parallel
Here's an explanation of the useful flags.
- -a [ --audit] Use to learn what the hashed values of the feature indicies and the weights of features are.
- -b [ --bit_precision ] arg The internal representation of the learning algorithm is a large array of floats which are indexed by hashing the feature value. This flag controls log2 of the array size. If you want no collisions, then you need the 2*log(number of features) by the birthday paradox. On very large datasets where we can't easily represent all the features, we found this mechanism for sparsity to be more effective than the sparsification technique used in version 1. Note that your speed may be highly dependent on this parameter---if the weight vector fits in the l2 cache, you can be extremely efficient.
- -c [ --cache ] Whether or not to use a cache. For linear representations, this typically results in an order of magnitude speedup. The cache file contents depend on -b, and this dependence is autochecked. If a valid cache is not found, the program starts creating one.
- --cache_file arg The location of the cache file. By default it is data_file.cache.
- -d [ --data ] arg The training or testing file. See below for the format. The "-d" flag isn't necessary, because an unflagged argument is the datafile by default.
- --daemon Listen for data at port 39523 instead of reading from a file.
- --decay_learning_rate arg The learning rate is multiplied by this quantity after every pass over the data.
- -f [ --final_regressor ] arg Which file to output the final regressor into.
- -h [ --help ] Output the set of flags. Using no arguments has the same effect.
- -i [ --initial_regressor ] arg Start by loading an initial regressor. The regressor file contains -b, -s, and -q flag arguments used when producing the regressor and will overrule any that you try to give.
- --initial_t arg (=1) An offset to the initial count. This only impacts learning if the learning rate decays with t.
- --min_prediction arg (=0) By default, VW clips all predictions less than 0 to 0. You can choose a different boundary with this flag.
- --max_prediction arg (=1) By default, VW clips all predictions greater than 1 to 1. You can choose a different boundary with this flag.
- --multisource arg This is for cluster parallelism. You specify the number of incoming --predictto connections.
- --port arg Specify a port for the daemon to listen on.
- --power_t arg (=0) The power on 1/(initial_t + t) which controls the learning rate.
- --passes arg The number of times the learning algorithm passes over the data. We found that decaying the learning rate by a factor of 1/20.5 was effective, so this is the default. You can change the decay rate via --decay_learning_rate or create your own multipass algorithm via use of --initial_regressor.
- -p [ --predictions ] arg File to output predictions to. This can be used during either training or testing. Note that if order matters then you should set --threads 1.
- --predictto argThis is for clusterparallelism. You specify where "hostname:port" to send the prediction to.
- -q [ --quadratic ] arg Whether or not and which quadratic features to create. The argument is two characters---the first character of two namespaces which are created.
- --quiet This turns off all the informative printouts.
- -r [ --raw_predictions ] arg A file to output raw (unnormalized) prediction to. This is sometimes helpful if you are using the score for ordering rather than probabilistic prediction.
- --sendto arg This specifies where to send examples to. If multiple --sendto's are used, this implements feature sharding, breaking up features by destination.
- -t [ --testonly ] Ignore any available label information and don't train. You probably want to use -p and --threads 1 also.
- --thread_bits arg log2 of the number of threads to use in the core. This option is typically useless, unless you are using -q.
Data file format
The training set is a line-by-line format of the form <label> <weight> <tag>|<namespace> <feature> <feature> ... |<namespace> <feature> <feature> ...
The semantics is: features with the same name are different features in different namespaces.
If you want to specify a value for a feature, you do this by adding :<float> to the namespace (for all features in the namespace) or the feature. For example "|txt:-1 foo bar baz" would say that the features "foo", "bar", and "baz" each have value -1 (rather then the default of 1). The <tag> is a string not containing a special character which is echoed on output of any predictions.
If you don't specify a label, the learning algorithm doesn't try to learn (but it does test).
If you don't specify a weight, it defaults to 1.