One way to distinguish different learning algorithms is by their ability or inability to easily use an input variable as the predicted output. This is desirable for at least two reasons:
- Modularity If we want to build complex learning systems via reuse of a subsystem, it’s important to have compatible I/O.
- “Prior” knowledge Machine learning is often applied in situations where we do have some knowledge of what the right solution is, often in the form of an existing system. In such situations, it’s good to start with a learning algorithm that can be at least as good as any existing system.
When doing classification, most learning algorithms can do this. For example, a decision tree can split on a feature, and then classify. The real differences come up when we attempt regression. Many of the algorithms we know and commonly use are not idempotent predictors.
- Logistic regressors can not be idempotent, because all input features are mapped through a nonlinearity.
- Linear regressors can be idempotent—they just set the weight on one input feature to 1 and other features to 0.
- Regression trees are not idempotent, or (at least) not easily idempotent. In order to predict the same as an input feature, that input feature must be split many times.
- Bayesian approaches may or may not be easily idempotent, depending on the structure of the Bayesian Prior.
It isn’t clear how important the idempotent-capable property is. Successive approximation approaches such as boosting can approximate it in a fairly automatic maner. It may be of substantial importance for large modular systems where efficiency is important.