Previously, we discussed parallel machine learning a bit. As parallel ML is rather difficult, I’d like to describe my thinking at the moment, and ask for advice from the rest of the world. This is particularly relevant right now, as I’m attending a workshop tomorrow on parallel ML.
Parallelizing slow algorithms seems uncompelling. Parallelizing many algorithms also seems uncompelling, because the effort required to parallelize is substantial. This leaves the question: Which one fast algorithm is the best to parallelize? What is a substantially different second?
One compellingly fast simple algorithm is online gradient descent on a linear representation. This is the core of Leon’s sgd code and Vowpal Wabbit. Antoine Bordes showed a variant was competitive in the large scale learning challenge. It’s also a decades old primitive which has been reused in many algorithms, and continues to be reused. It also applies to online learning rather than just online optimization, implying the algorithm can be used in a host of situations where batch algorithms are awkward or unviable.
If we start with a fast learning algorithm as a baseline, there seem to be two directions we can go with parallel ML:
- (easier) Try to do more in the same amount of time. For example, Paul and Neil suggest using parallelism to support ensemble methods.
- (harder) Try to use parallelism to reduce the amount of time required to effectively learn on large amounts of data. For this approach, bandwidth and latency become uncomfortably critical implementation details. Due to these issues, it appears important to at least loosen the goal to competing with learning on large amounts of data. Alternatively, we could consider this as evidence some other learning primitive is desirable, although I’m not sure which one.