Researchers are typically confronted with big problems that they have no idea how to solve. In trying to come up with a solution, a natural approach is to decompose the big problem into a set of subproblems whose solution yields a solution to the larger problem. This approach can go wrong in several ways.
- Decomposition failure. The solution to the decomposition does not in fact yield a solution to the overall problem.
- Artificial hardness. The subproblems created are sufficient if solved to solve the overall problem, but they are harder than necessary.
As you can see, computational complexity forms a relatively new (in research-history) razor by which to judge an approach sufficient but not necessary.
In my experience, the artificial hardness problem is very common. Many researchers abdicate the responsibility of choosing a problem to work on to other people. This process starts very naturally as a graduate student, when an incoming student might have relatively little idea about how to do research, so they naturally abdicate the problem choice to an advisor. As an inexperienced graduate student, it’s difficult to avoid this, because an advisor often really does know much better about what is easy, what is hard, and how to decompose a complex problem into solvable subproblems. Nevertheless, if your plan in life is to do substantial research, it’s essential even then to question research directions carefully.
In contrast to sufficient subgoals of a greater goal, there are also necessary subgoals. A necessary subgoal is one which must be solved to solve the greater goal. One of the reasons why the artificial hardness problem is so common is that the sufficient subgoals are commonly confused with necessary subgoals. The essential test for a necessary subgoal is whether or not a solution to the global problem can be used as a solution to the subgoal.
My personal greater goal is creating a master machine learning algorithm that can solve any reasonable learning problem where “reasonable” includes at least the set that humans can solve. Relative to this greater goal, many existing research programs do not appear necessary.
- The short form of my second comment on Marcus’s post is that I see the sufficiency but not the necessity of competing with all Turing machines.
- The necessity of several statistical modeling approaches appears unclear to me, because they often encounter severe computational problems. Perhaps this is an example of creating an artificially hard problem, as empirical experiences with Searn suggest.
What is necessary?
- Large data. It is clear that humans solving learning problems have access to large amounts of information which they employ to solve these problems. While we don’t stick children into a sensory deprivation tank to see how much it retards their ability to solve problems when grown, some experiments along these lines have been done with animals yielding obvious ability deficiency.
- Online learning. The ability to learn in an online environment with relatively little processing per bit of input is clearly a sufficient approach to solve many problems. We can also argue that it is a computational necessity, as retraining based upon all past information appears computationally infeasible, or at least deeply wasteful.
- Interactive learning. It’s clear that many animals use an interactive process to learn about the world. This form of learning is also necessary, because it provides the ability to answer unanticipated questions. We can further argue the necessity by pointing out that interactive proofs appear much more powerful in computational complexity theory than noninteractive proofs. For example, viewed from a learning perspective, much of science is about interactive learning.
- Compositional Design of a learning system. The necessity of compositional design in machine learning is not entirely clear. For example, we could imagine that it’s possible to design good learning systems using an evolutionary approach. Nevertheless, since our basic goal in research is a much more efficient and faster design, it seems that the decision to take a research-based approach implies that compositional design is necessary. Restated: we should be able to design the system to learn in components which compose to form an overall solution.
- Large contexts. It’s necessary that a learning algorithm be able to use a relatively large number of bits when making a decision. For example, people working on vision have cool examples where people manage to use many different cues to predict what an object is.
- Nonlinearity. People can clearly solve learning problems for which no linear representation (of input information) is capable of achieving good performance.
Some of these are criticizable as perhaps unnecessary, and I can easily imagine missing others. If you have other arguments for what is or is not necessary for this greater goal, please speak up.
There are two other categories of subgoal research we could consider. There are subgoals which are necessary and sufficient (in combination) to solve the greater goal. I don’t know of any such arguments for my greater goal.
The fourth category is subgoals which are neither necessary nor sufficient for a greater goal. In my experience such papers are quite common at conferences with types that include:
- Work on speeding up a slow algorithm leaving it slower than the state of the art,
- Otherwise improving an algorithm which is suboptimal while leaving it suboptimal.
The nitty-gritty of these questions come at review time. Which papers should be accepted? In general, decision making is pretty terrible because greater goals are rarely stated, perhaps as a form of strategic ambiguity. After all, the set of people attending a conference have multiple greater goals. Nevertheless, my personal ordering is:
- Necessary and sufficient research directions. An emptyset in my experience.
- Necessary research directions. This is typically a small fraction.
- Sufficient research directions. This is a greater fraction.
- Neither. This is often the majority.
- Wrong. Must be rejected.
So the question to periodically ask yourself as a researcher is: What is the greater goal? Is this subgoal necessary? Sufficient? Something else?
Very interesting post. I would like to point out that it is possible to observe this matter (necessary and sufficient research) in Vladimir Vapnik’s work.
Just to exemplify… He claims that inductive learning is sufficient but not necessary if your goal is classification. That’s why he introduced transductive learning.
Here’s an idea. A scientist should always know what his greater goal is. I think I have known mine for a while, and it’s a bit like a special case of yours: to make computers better at mathematics than humans are.
Except I should add to that goal the following: “mathematics” is a moving target because it changes with time. In fact, it would be quite acceptable to change mathematics somewhat if that made it easier for computers.
This is very good perspective on the subject. I think that computer science as a field is far too full of sufficient-but-not-necesary research directions (in a sense, if your problem is making efficient SAT solvers, most complexity theory trying to decide whether P is NP qualifies as sufficient-but-not-necessary research; same goes for the excessive focus on worst-case behavior).
Altough the “necessity test” for a given research direction is usually undecidable before pursuing the direction, unless it falls in the neither-sufficient-nor-necessary category.
The separation into necessary v.s. sufficient is quite interesting.
I think artificial hardness is a common source of trouble in computer vision. Object recognition is a great candidate for the canonical example of artificial hardness. Yes, it would be great to recognize objects from a single image and learn from 1 image as well. So far it was really hard.
I think we should be paying more attention to the “necessary” part.
I’m a bit of an outsider and neophyte to the world of ML, but something struck me…you said:
“My personal greater goal is creating a master machine learning algorithm that can solve any reasonable learning problem”
My (probably limited) understanding is that this is exactly the kind of thing that the various No-Free-Lunch theorems say we can’t have. It seems highly likely that I have (i) misunderstood (the implications of) the NFL theorems, or (ii) misunderstood your goal, and more specifically your usage of the word “solve”.
More generally, though, what you’re suggesting/proposing w.r.t. the division into necessity & sufficiency rings true in my field (cognitive science), as well. In studying human cognition (qua human problem solving), too many researchers go about the task asking “What would enable a human to solve this problem?”, then finding some answer, and loudly proclaiming said answer as an intrinsic property of humans. (this is particularly rampant in some language-related corners of the disicpline)
A critical thing to understand about NFL logic is that it applies to a distribution over learning problems which is very dissimilar from the learning problems typically encountered by (say) a human. This is why NFL does not forbid a human from solving many or most of the problems they encounter. It’s also why NFL does not forbid a machine from similar abilities.