The notation g(n) = O(f(n)) means that in the limit as n approaches infinity there exists a constant C such that the g(n) is less than Cf(n).
In learning theory, there are many statements about learning algorithms of the form “under assumptions x,y, and z, the classifier learned has an error rate of at most O(f(m))“.
There is one very good reason to use O(): it helps you understand the big picture and neglect the minor details which are not important in the big picture. However, there are some important reasons not to do this as well.
- Unspeedup In algorithm analysis, the use of O() for time complexity is pervasive and well-justified. Determining the exact value of C is inherently computer architecture dependent. (The “C” for x86 processors might differ from the “C” on PowerPC processors.) Since many learning theorists come from a CS theory background, the O() notation is applied to generalization error. The O() abstraction breaks here—you can not generally change learning algorithm and decrease your error rate by some constant factor.
- You’re fired When you go and run a learning algorithm to acquire some predictor, the performance it achieves is a key quantity of significant interest. Using an algorithm with merely twice the error rate of a better algorithm can easily make the difference between “it works” and “it doesn’t”. This is not often true when programming, as evidenced by the large number of people who use computationally inefficient languages to solve problems.
We can’t say “never use O()“, because sometimes abstracting away details is the right thing to do. However, any use of O() should be analyzed for “reasonableness” more thoroughly than for computational complexity. Similarly, more interest should be allocated to improving constants in such analysis than is done in algorithms.