(Dis)similarities between academia and open source programmers

Martin Pool and I recently discussed the similarities and differences between academia and open source programming.

Similarities:

  1. Cost profile Research and programming share approximately the same cost profile: A large upfront effort is required to produce something useful, and then “anyone” can use it. (The “anyone” is not quite right for either group because only sufficiently technical people could use it.)
  2. Wealth profile A “wealthy” academic or open source programmer is someone who has contributed a lot to other people in research or programs. Much of academia is a “gift culture”: whoever gives the most is most respected.
  3. Problems Both academia and open source programming suffer from similar problems.
    1. Whether or not (and which) open source program is used are perhaps too-often personality driven rather than driven by capability or usefulness. Similar phenomena can happen in academia with respect to directions of research.
    2. Funding is often a problem for both groups. Academics often invest many hours in writing grants while open source programmers simply often are not paid.
  4. Both groups of people work in a mixed competitive/collaborative environment.
  5. Both groups use conferences as a significant mechanism of communication.

Given the similarities, it is not too surprising that there is significant cooperation between academia and open source programming, and it is relatively common to crossover from one to the other.

The differences are perhaps more interesting to examine because they may point out where one group can learn from the other.

  1. A few open source projects have achieved significantly larger scales than academia as far as coordination amongst many people over a long time. Big project examples include linux, apache, and mozilla. Groups of people of this scale in academia are typically things like “the ICML community”, or “people working on Bayesian learning”, which are significantly less tightly coupled than any of the above projects. This suggests it may be possible to achieve significantly larger close collaborations in academia.
  2. Academia has managed to secure significantly more funding than open source programmers. Funding typically comes from a mixture of student tuition and government grants. Part of the reason for better funding in academia is that it has been around longer and so been able to accomplish more. Perhaps governments will start funding open source programming more seriously if they produce an equivalent (with respect to societal impact) of the atom bomb.
  3. Academia has a relatively standard career path: grade school education, undergraduate education, graduate education, then apply for a job as a professor at a university. In contrast the closest thing to a career path for open source programmers is something like “do a bunch of open source projects and become so wildly succesful that some company hires you to do the same thing”. This is a difficult path but perhaps it is slowly becoming easier and there is still much room for improvement.
  4. Open source programmers take significantly more advantage of modern tools for communication. As an example of this, Martin mentioned that perhaps half the people working on Ubuntu have blogs. In academia, they are still a rarity.
  5. Open source programmers have considerably more freedom of location. Academic research is almost always tied to a particular university or lab, while many people who work on open source projects can choose to live esssentially anywhere with reasonable internet access.

Do you believe in induction?

Foster Provost gave a talk at the ICML metalearning workshop on “metalearning” and the “no free lunch theorem” which seems worth summarizing.

As a review: the no free lunch theorem is the most complicated way we know of to say that a bias is required in order to learn. The simplest way to see this is in a nonprobabilistic setting. If you are given examples of the form (x,y) and you wish to predict y from x then any prediction mechanism errs half the time in expectation over all sequences of examples. The proof of this is very simple: on every example a predictor must make some prediction and by symmetry over the set of sequences it will be wrong half the time and right half the time. The basic idea of this proof has been applied to many other settings.

The simplistic interpretation of this theorem which many people jump to is “machine learning is dead” since there can be no single learning algorithm which can solve all learning problems. This is the wrong way to think about it. In the real world, we do not care about the expectation over all possible sequences, but perhaps instead about some (weighted) expectation over the set of problems we actually encounter. It is enitrely possible that we can form a prediction algorithm with good performance over this set of problems.

This is one of the fundamental reasons why experiments are done in machine learning. If we want to access the set of problems we actually encounter, we must do this empirically. Although we must work with the world to understand what a good general-purpose learning algorithm is, quantifying how good the algorithm is may be difficult. In particular, performing well on the last 100 encountered learning problems may say nothing about performing well on the next encountered learning problem.

This is where induction comes in. It has been noted by Hume that there is no mathematical proof that the sun will rise tomorrow which does not rely on unverifiable assumptions about the world. Nevertheless, the belief in sunrise tomorrow is essentially universal. A good general purpose learning algorithm is similar to ‘sunrise’: we can’t prove that we will succeed on the next learning problem encountered, but nevertheless we might believe it for inductive reasons. And we might be right.

Apprenticeship Reinforcement Learning for Control

Pieter Abbeel presented a paper with Andrew Ng at ICML on Exploration and Apprenticeship Learning in Reinforcement Learning. The basic idea of this algorithm is:

  1. Collect data from a human controlling a machine.
  2. Build a transition model based upon the experience.
  3. Build a policy which optimizes the transition model.
  4. Evaluate the policy. If it works well, halt, otherwise add the experience into the pool and go to (2).

The paper proves that this technique will converge to some policy with expected performance near human expected performance assuming the world fits certain assumptions (MDP or linear dynamics).

This general idea of apprenticeship learning (i.e. incorporating data from an expert) seems very compelling because (a) humans often learn this way and (b) much harder problems can be solved. For (a), the notion of teaching is about transferring knowledge from an expert to novices, often via demonstration. To see (b), note that we can create intricate reinforcement learning problems where a particular sequence of actions must be taken to achieve a goal. A novice might be able to memorize this sequence given just one demonstration even though it would require experience exponential in the length of the sequence to discover the key sequence accidentally.

Andrew Ng’s group has exploited this to make this very fun picture.
(Yeah, that’s a helicopter flying upside down, under computer control.)

As far as this particular paper, one question occurs to me. There is a general principle of learning which says we should avoid “double approximation”, such as occurs in step (3) where we build an approximate policy on an approximate model. Is there a way to fuse steps (2) and (3) to achieve faster or better learning?

Not goal metrics

One of the confusing things about research is that progress is very hard to measure. One of the consequences of being in a hard-to-measure environment is that the wrong things are often measured.

  1. Lines of Code The classical example of this phenomenon is the old lines-of-code-produced metric for programming. It is easy to imagine systems for producing many lines of code with very little work that accomplish very little.
  2. Paper count In academia, a “paper count” is an analog of “lines of code”, and it suffers from the same failure modes. The obvious failure mode here is that we end up with a large number of uninteresting papers since people end up spending a lot of time optimizing this metric.
  3. Complexity Another metric, is “complexity” (in the eye of a reviewer) of a paper. There is a common temptation to make a method appear more complex than it is in order for reviewers to judge it worthy of publication. The failure mode here is unclean thinking. Simple effective methods are often overlooked in favor of complex relatively ineffective methods. This is simply wrong for any field. (Discussion at Lance‘s blog.)
  4. Acceptance Rate “Acceptance rate” is the number of papers accepted/number of papers submitted. A low acceptance rate is often considered desirable for a conference. But:
    1. It’s easy to skew an acceptance rate by adding (or inviting) many weak or bogus papers.
    2. It’s very difficult to judge what, exactly, is good work in the long term. Consequently, a low acceptance rate can retard progress by simply raising the bar too high for what turns out to be a good idea when it is more fully developed. (Consider the limit where only one paper is accepted per year…)
    3. Accept/reject decisions can become more “political” and less about judging the merits of a paper/idea. With a low acceptance ratio, a strong objection by any one of several reviewers might torpedo a paper. The consequence of this is that papers become noncontroversial with a tendency towards incremental improvements.
    4. A low acceptance rate tends to spawn a multiplicity of conferences in one area. There is a strong multiplicity of learning-related conferences.

    (see also How to increase the acceptance ratios at top conferences?)

  5. Citation count Counting citations is somewhat better than counting papers because it is some evidence that an idea is actually useful. This has been particularly aided by automated citation counting systems like scholar.google.com and http://citeseer.ist.psu.edu/. However, there are difficulties—citation counts can be optimized using self-citation and “societies of mutual admiration” (groups of people who agree implicitly or explicitly to cite each other). Citations are also sometimes negative of the form “here we fix bad idea X”.
  6. See also the Academic Mechanism Design post for other ideas.

These metrics do have some meaning. A programmer who writes no lines of code isn’t very good. An academic who produces no papers isn’t very good. A conference that doesn’t aid information filtration isn’t helpful. Hard problems often require complex solutions. Important papers are often cited.

Nevertheless, optimizing these metrics is not beneficial for a field of research. In thinking about this, we must clearly differentiate 1) what is good for a field of research (solving important problems) and 2) what is good for individual researchers (getting jobs). The essential point here is that there is a disparity.

Any individual in academia cannot avoid being judged by these metrics. Attempts by an individual or a small group of individuals to ignore these metrics is unlikely to change the system (and likely to result in the individual or small group being judged badly).

I don’t believe there is an easy fix to this problem. The best we can hope for is incremental progress which takes the form of the leadership in the academic community introducing new, saner metrics. This is a difficult thing, particularly because any academic leader must have succeeded in the old system. Nevertheless, it must happen if academic-style research is to flourish.

In the spirit of being constructive, I’ll make one proposal which may address the “complexity” problem: judge the importance of a piece of work independent of the method. For a conference paper this might be done by changing the review process to have one “technical reviewer” and several “importance reviewers” rather than 3 or 4 reviewers. The “importance reviewer” is easier than the current standard: they must simply understand the problem being solved and rate how important this problem is. The technical reviewers job is harder than the current standard: they must verify that all claims of solution to the problem are met. Overall, the amount of work by reviewers would stay constant, and perhaps we would avoid the preference for complex solutions.

Six Months

This is the 6 month point in the “run a research blog” experiment, so it seems like a good point to take stock and assess.

One fundamental question is: “Is it worth it?” The idea of running a research blog will never become widely popular and useful unless it actually aids research. On the negative side, composing ideas for a post and maintaining a blog takes a significant amount of time. On the positive side, the process might yield better research because there is an opportunity for better, faster feedback implying better, faster thinking.

My answer at the moment is a provisional “yes”. Running the blog has been incidentally helpful in several ways:

  1. It is sometimes educational. example
  2. More often, the process of composing thoughts well enough to post simply aids thinking. This has resulted in a couple solutions to problems of interest (and perhaps more over time). If you really want to solve a problem, letting the world know is helpful. This isn’t necessarily because the world will help you solve it, but it’s helpful nevertheless.
  3. In addition, posts by others have helped frame thinking about “What are important problems people care about?”, and why. In the end, working on the right problem is invaluable.