Multitask learning is the problem of jointly predicting multiple labels simultaneously with one system. A basic question is whether or not multitask learning can be decomposed into one (or more) single prediction problems. It seems the answer to this is “yes”, in a fairly straightforward manner.
The basic idea is that a controlled input feature is equivalent to an extra output. Suppose we have some process generating examples: (x,y1,y2) in S where y1 and y2 are labels for two different tasks. Then, we could reprocess the data to the form Sb(S) = {((x,i),yi): (x,y1,y2) in S, i in {1,2}} and then learn a classifier c:X x {1,2} -> Y. Note that (x,i) is the (composite) input. At testing time, given an input x, we can query c for the predicted values of y1 and y2 using (x,1) and (x,2).
A strong form of equivalence can be stated between these tasks. In particular, suppose we have a multitask learning algorithm ML which learns a multitask predictor m:X -> Y x Y. Then the following theorem can be proved:
For all ML for all S, there exists an inverse reduction Sm such that ML(S) = ML(Sm(Sb(S)).
In other words, no information is lost in the transformation Sb which means everything which was learnable previously remains learnable.
This may not be the final answer to the question because there may be some algorithm-dependent (mis)behavior associated with controlled feature i. It may also be the case that single task classification is computationally distinguishable from multitask classification. Certainly, computational concerns are one of the reasons specialized multitask classification algorithms exist.