Machined Learnings

Paul Mineiro has started Machined Learnings where he’s seriously attempting to do ML research in public. I personally need to read through in greater detail, as much of it is learning reduction related, trying to deal with the sorts of complex source problems that come up in practice.

Necessary and Sufficient Research

Researchers are typically confronted with big problems that they have no idea how to solve. In trying to come up with a solution, a natural approach is to decompose the big problem into a set of subproblems whose solution yields a solution to the larger problem. This approach can go wrong in several ways.

  1. Decomposition failure. The solution to the decomposition does not in fact yield a solution to the overall problem.
  2. Artificial hardness. The subproblems created are sufficient if solved to solve the overall problem, but they are harder than necessary.

As you can see, computational complexity forms a relatively new (in research-history) razor by which to judge an approach sufficient but not necessary.

In my experience, the artificial hardness problem is very common. Many researchers abdicate the responsibility of choosing a problem to work on to other people. This process starts very naturally as a graduate student, when an incoming student might have relatively little idea about how to do research, so they naturally abdicate the problem choice to an advisor. As an inexperienced graduate student, it’s difficult to avoid this, because an advisor often really does know much better about what is easy, what is hard, and how to decompose a complex problem into solvable subproblems. Nevertheless, if your plan in life is to do substantial research, it’s essential even then to question research directions carefully.

In contrast to sufficient subgoals of a greater goal, there are also necessary subgoals. A necessary subgoal is one which must be solved to solve the greater goal. One of the reasons why the artificial hardness problem is so common is that the sufficient subgoals are commonly confused with necessary subgoals. The essential test for a necessary subgoal is whether or not a solution to the global problem can be used as a solution to the subgoal.

My personal greater goal is creating a master machine learning algorithm that can solve any reasonable learning problem where “reasonable” includes at least the set that humans can solve. Relative to this greater goal, many existing research programs do not appear necessary.

  1. The short form of my second comment on Marcus’s post is that I see the sufficiency but not the necessity of competing with all Turing machines.
  2. The necessity of several statistical modeling approaches appears unclear to me, because they often encounter severe computational problems. Perhaps this is an example of creating an artificially hard problem, as empirical experiences with Searn suggest.

What is necessary?

  1. Large data. It is clear that humans solving learning problems have access to large amounts of information which they employ to solve these problems. While we don’t stick children into a sensory deprivation tank to see how much it retards their ability to solve problems when grown, some experiments along these lines have been done with animals yielding obvious ability deficiency.
  2. Online learning. The ability to learn in an online environment with relatively little processing per bit of input is clearly a sufficient approach to solve many problems. We can also argue that it is a computational necessity, as retraining based upon all past information appears computationally infeasible, or at least deeply wasteful.
  3. Interactive learning. It’s clear that many animals use an interactive process to learn about the world. This form of learning is also necessary, because it provides the ability to answer unanticipated questions. We can further argue the necessity by pointing out that interactive proofs appear much more powerful in computational complexity theory than noninteractive proofs. For example, viewed from a learning perspective, much of science is about interactive learning.
  4. Compositional Design of a learning system. The necessity of compositional design in machine learning is not entirely clear. For example, we could imagine that it’s possible to design good learning systems using an evolutionary approach. Nevertheless, since our basic goal in research is a much more efficient and faster design, it seems that the decision to take a research-based approach implies that compositional design is necessary. Restated: we should be able to design the system to learn in components which compose to form an overall solution.
  5. Large contexts. It’s necessary that a learning algorithm be able to use a relatively large number of bits when making a decision. For example, people working on vision have cool examples where people manage to use many different cues to predict what an object is.
  6. Nonlinearity. People can clearly solve learning problems for which no linear representation (of input information) is capable of achieving good performance.

Some of these are criticizable as perhaps unnecessary, and I can easily imagine missing others. If you have other arguments for what is or is not necessary for this greater goal, please speak up.

There are two other categories of subgoal research we could consider. There are subgoals which are necessary and sufficient (in combination) to solve the greater goal. I don’t know of any such arguments for my greater goal.

The fourth category is subgoals which are neither necessary nor sufficient for a greater goal. In my experience such papers are quite common at conferences with types that include:

  1. Work on speeding up a slow algorithm leaving it slower than the state of the art,
  2. Otherwise improving an algorithm which is suboptimal while leaving it suboptimal.

The nitty-gritty of these questions come at review time. Which papers should be accepted? In general, decision making is pretty terrible because greater goals are rarely stated, perhaps as a form of strategic ambiguity. After all, the set of people attending a conference have multiple greater goals. Nevertheless, my personal ordering is:

  1. Necessary and sufficient research directions. An emptyset in my experience.
  2. Necessary research directions. This is typically a small fraction.
  3. Sufficient research directions. This is a greater fraction.
  4. Neither. This is often the majority.
  5. Wrong. Must be rejected.

So the question to periodically ask yourself as a researcher is: What is the greater goal? Is this subgoal necessary? Sufficient? Something else?

Carbon in Computer Science Research

Al Gore‘s film and gradually more assertive and thorough science has managed to mostly shift the debate on climate change from “Is it happening?” to “What should be done?” In that context, it’s worthwhile to think a bit about what can be done within computer science research.

There are two things we can think about:

  1. Doing Research At a cartoon level, computer science research consists of some combination of commuting to&from work, writing programs, running them on computers, writing papers, and presenting them at conferences. A typical computer has a power usage on the order of 100 Watts, which works out to 2.4 kiloWatt-hours/day. Looking up David MacKay‘s reference on power usage per person, it becomes clear that this is a relatively minor part of the lifestyle, although it could become substantial if many more computers are required. Much larger costs are associated with commuting (which is in common with many people) and attending conferences. Since local commuting is common across many people, and there are known approaches (typically public transportation) for more efficient commuting, I expect researchers can piggyback on improvements in public transportation to reduce commuting costs. In fact, the situation for researchers may be better in general, as the nature of the job may make commuting avoidable, at least on some days.

    Presenting at conferences is the remaining problem area, essentially due to travel by airplane to and from a conference. Travel by airplane has an energy cost similar to travel by car over the same distance, but we typically take airplanes for very long distances. Unlike cars, typical airplane usage requires stored energy in a dense form. For example, there are no serious proposals I’m aware of for battery-powered airplanes, because all existing rechargeable batteries have a power density around 1/10th that of hydrocarbon fuel (which makes sense given that about 3/4 of the mass for a hydrocarbon fire is oxygen in the air). This suggests airplane transport may be particularly difficult to adapt towards low or zero carbon usage. The plausible approaches I know involve either using electricity (from where?) to inefficiently crack water for hydrogen, or the biofuel approach where hydrocarbons are made by plants, with neither of these approaches particularly far along in development. If these aren’t developed, it seems we should expect fewer conferences, more regional conferences, Europe with it’s extensive fast train network to be less impacted, and more serious effort towards distributed conferences. For the last, it’s easy to imagine with existing technology having simultaneous regional conferences which are mutually videoconferenced, and we aren’t far from being able to handle a fully interactive videobroadcast amongst an indefinitely large number of participants. As a corollary of fewer conferences, other interactive mechanisms (for example research blogs) seems likely to grow.

  2. Research Topics They keyword for research topics is efficiency, and it is not a trivial concern on a global scale. In computer science, there have been a few algorithms (such as quicksort and hashing) developed which substantially and broadly improved real-world efficiency, but the real driver of efficiency so far is the hardware development, which has phenomenally improved efficiency for several decades.

    Many of the efficiency improvements are sure to remain hardware based, but software is becoming an essential component. One basic observation about efficient algorithms is that for problems admitting an efficient parallel solution (counting is a great example), the parallel algorithm is generally more efficient, because energy use is typically superlinear in clock speed. As an extreme example, the human brain which is deeply optimized by evolution for energy efficiency typically runs at at 100Hz or 100KHz.

    Although efficiency suggests parallel algorithms, this should not be done blindly. For example, in machine learning the evidence I’ve seen so far suggests that online learning (which is admittedly harder to parallelize) is substantially more efficient than batch style learning, implying that I expect online approaches to be more efficient than map-reduce based machine learning as is typically seen in the Mahout project.

    A substantial difficulty with parallel algorithms is the programming itself. In this regard, there is plenty of room for programming language work as well.

Multitask Poisoning

There are many ways that interesting research gets done. For example it’s common at a conference for someone to discuss a problem with a partial solution, and for someone else to know how to solve a piece of it, resulting in a paper. In some sense, these are the easiest results we can achieve, so we should ask: Can all research be this easy?

The answer is certainly no for fields where research inherently requires experimentation to discover how the real world works. However, mathematics, including parts of physics, computer science, statistics, etc… which are effectively mathematics don’t require experimentation. In effect, a paper can be simply a pure expression of thinking. Can all mathematical-style research be this easy?

What’s going on here is research-by-communication. Someone knows something, someone knows something else, and as soon as someone knows both things, a problem is solved. The interesting thing about research-by-communication is that it is becoming radically easier with improved technology. It’s already possible to communicate with nearly anyone anywhere in the world via {voice, video, text, shared program/website/blog}. As these mechanisms become cheaper in cost and ease of use, we’ll surely see their adoption into many activities. If all mathematical-style research can be sped up by communication, that’s great news for research.

However, I don’t believe it. The essential difficulty is that doing good research often requires the simultaneous understanding of several different things—the problem, all the broken approaches to solving some problem, why they break, and some hint about where to look for a solution. Absorbing and simultaneously keeping in mind this information requires a substantial amount of effort.

The extent of effort required is perhaps most easily understood in collaboration with yourself. Often, a problem is not immediately solved the first time it is thought of, instead a researcher must attack it again and again, until either giving up or finding a solution. A basic parameter in attacking a problem is: How hard is it to resume where you left off?

For many of the problems I’ve worked on, the answer is pretty hard. After weeks of not working on a problem, it might take several hours to refresh my memory with all the simultaneous concerns that go into a solution. Even if I worked on such a problem yesterday, it might take a half hour or an hour to reach a state where I’m prepared to make progress.

Given the difficulty of research, I (and many other people) often struggle in dealing with interruptions. Modern technology has made communication very easy, implying a stream of potential interruptions throughout the day, some of which are undeniably fruitful. And yet, an interruption means the overhead of getting back to thinking must be paid yet again.

Trading off properly between the value of avoiding the overhead and the value of dealing with interruptions is a constant struggle which typically did not exist before before modern communication technology made it prevalent. I think it is common to give in to the interrupts, and effectively cease to be able to do research other than research-by-communication. I’m not ready to do that, so I’ve found it essential to arrange large unused and uninterrupted periods of time for the purpose of doing research.

Although the solution is simple, I’ve often found implementing it challenging in several ways. There is the standard problem that people you deal with don’t understand the overhead of switching tasks in research. However, there is also a more insidious problem. Coping with the modern world requires that at least some portion of our time be devoted to interrupts, which are almost always easier to deal with than research. Dealing with these interrupts therefore can create a bad habit where you seek interrupts to achieve the (short term, easy) gratification of dealing with them. Thus, multitasking creates an internal expectation of multitasking, which makes multitasking preferred, eliminating the ability to do careful research. This is what I think of as multitask poisoning—it’s not just the time lost to swapping some context back in, but also the bad habits of self-interruption that it creates.

I don’t have a good answer to this problem, other than continuing the struggle to preserve substantial contiguous chunks of time for thinking. An alternative sometimes-applicable solution is to reduce the overhead of starting to solve a problem by decomposing the problem into subproblems. Where possible, this is of course valuable, but mathematical research is often almost uniquely undecomposable, because the nature of the problem is that it’s solution is unknown and hence undecomposable. Restated, the real problem is finding a valid decomposition.

I self-interrupted at least 3 times in making this post.

Effective Research Funding

With a worldwide recession on, my impression is that the carnage in research has not been as severe as might be feared, at least in the United States. I know of two notable negative impacts:

  1. It’s quite difficult to get a job this year, as many companies and universities simply aren’t hiring. This is particularly tough on graduating students.
  2. Perhaps 10% of IBM research was fired.

In contrast, around the time of the dot com bust, ATnT Research and Lucent had one or several 50% size firings wiping out much of the remainder of Bell Labs, triggering a notable diaspora for the respected machine learning group there. As the recession progresses, we may easily see more firings as companies in particular reach a point where they can no longer support research.

There are a couple positives to the recession as well.

  1. Both the implosion of Wall Street (which siphoned off smart people) and the general difficulty of getting a job coming out of an undergraduate education suggest that the quality of admitted phd students may increase. In half a decade when they start graduating, we might have some new and very creative ideas.
  2. The latest stimulus bill includes substantial additional research funding. This is particularly welcome news for those at universities, because it will compensate for other cutbacks which may be necessary there as endowments or state funding fall. It’s also particularly good for young researchers at universities who just got a position or succeed this year, as the derivative on research funding particularly impacts them.

There are two effects going on: Does a recession cause us to refocus on other possibly better ideas? Or does it cause us to focus on short term survival? The first effect helps research while the second effect does not. By far, most of the money invested by governments to fight the recession has gone towards survival, but a small fraction in the US is going towards other possibly better ideas, with a portion of that going towards research.

We could hope for a larger fraction of money heading towards new ideas, rather than rescuing old, but there is a basic issue: the apparatus for creation and use of new ideas in the US is simply too small—it may not be able to effectively use more funding. In order to justify further funding for research, we may need to be more creative than simply “give us more”.

However, this is easy. Throughout much of the 1900s, Bell Labs created many inventions which are fundamental to modern society, including the transistor, C(++), Unix, the laser, information theory, etc… In my view the vital ingredients for success are:

  1. Access to cutting edge problems. Even extraordinarily intelligent researchers can simply end up working on the wrong problem. Without direct access to and knowledge of such problems researchers can end up inventing their own, which occasionally works out well, but more often does not.
  2. Free time. This is both obvious and yet a common failure mode. Researchers at universities have many more demands on their time, including teaching, fundraising, mentoring, and running a university. Similarly, researchers at corporations can be sucked into patching an existing system rather than thinking about the best way to really solve a problem.
  3. Concentration. Two researchers working together can often manage much more than one apart, as each can bring relevant expertise and viewpoints necessary to solve a problem. This remains true up to the point where communication becomes a substantial overhead, which in my experience is about 5, but which we might imagine technology helps improve.

Bell Labs managed to satisfy all three of these desiderata. Some research universities manage to achieve at least access and concentration to some extent, but hidden difficulties exist. For example, professors often don’t work with other professors, because they are both too busy with students and they must make a case for tenure based on work which is unambiguously their own. I’m not extremely familiar with existing national labs, but I believe they often fail at (a)—at least research at national labs have had relatively little impact on newer fields such as computer science.

So, my suggestion would be funding research in modes which satisfy all three desiderata. The natural and easy way to do this is by the government partially subsidizing basic research at those corporations which have decided to fund basic research. In computer science at least, this includes Microsoft, IBM, Yahoo, Google, and what’s left of Bell Labs at ATnT and Alcatel. While this is precisely the conclusion you might expect from someone doing research at one of these places, it’s also what you would expect of someone intensely interested in research who sought out the best environment for research. In economic terms, these companies have for reasons of their own decided to provide a public good. As long as we are interested, as a nation, or as a civilization, in subsidizing this public good, it is desirable to do this as efficiently as possible.

Some people might think that basic research done at a university is inherently more desirable than the same in industry. I don’t see any reason for this. For example, it seems that patentable research is about as likely to be patented at a university as elsewhere, and hence equally restricted for public use over the duration of a patent. Other people might think that basic research only really happens at universities or national labs, but that simply doesn’t agree with history.

Given this, it’s odd that the rules for NSF funding, which is the premier source of funding for basic science in the US, generally requires university participation on proposals. This restriction naturally makes it easier for researchers at universities to acquire grant money than researchers not at universities. I don’t understand why this restriction is desirable from the viewpoint of a government wanting to effectively subsidize research.