Pat (the practitioner) I need to do multiclass classification and I only have a decision tree.
Theo (the thoeretician) Use an error correcting output code.
Pat Oh, that’s cool. But the created binary problems seem unintuitive. I’m not sure the decision tree can solve them.
Theo Oh? Is your problem a decision list?
Pat No, I don’t think so.
Theo Hmm. Are the classes well separated by axis aligned splits?
Pat Err, maybe. I’m not sure.
Theo Well, if they are, under the IID assumption I can tell you how many samples you need.
Pat IID? The data is definitely not IID.
Theo Oh dear.
Pat Can we get back to the choice of ECOC? I suspect we need to build it dynamically in response to which subsets of the labels are empirically separable from each other.
Theo Ok. What do you know about your problem?
Pat Not much. My friend just gave me the dataset.
Theo Then, no one can help you.
Pat (What a fuzzy thinker. Theo keeps jumping to assumptions that just aren’t true.)
Theo (What a fuzzy thinker. Pat’s problem is unsolvable without making extra assumptions.)
I’ve heard variants of this conversation several times. The fundamental difference in viewpoint is the following:
- Theo lives in a world where he chooses the problem to solve based upon learning model (and assumptions) used.
- Pat lives in a world where the problem is imposed on him.
I’d love for these confusions to go away, but there is no magic wand. The best advice seems to be: listen carefully and avoid assuming to much in what you hear.
I would say that core problem is that Pat and Theo have assumed (implicitly and optimistically) that the problem is soluble in a single step. They are anticipating the last iteration of the analytical process rather than addressing the first iteration.
Pat says (at the end of the conversation) that he knows very little about the data set. All analytical techniques are theoretically based on some assumptions. Different techniques are based on different assumptions. Different data sets violate different assumptions to different extents. Different techniques are differentially sensitive to assumption violations.
So, the first iteration of analysis should use techniques that make the fewest assumptions and are least sensitive to violations. You then proceed iteratively, to increase your understanding of the data and to home in on techniques that are better suited to the data you actually have. Giving opinions as to what you will do in the final iteration before you have started the first is fun as a test of clairvoyance, but has about the same status as a parlour game.
Pat shouldn’t have (seriously) asked for detailed guidance under circumstances where it can’t reasonably be given. Theo shouldn’t have allowed himself to be sucked into answering that question without putting a bunch of caveats around it.