The Coming Patent Apocalypse

Many people in computer science believe that patents are problematic. The truth is even worse—the patent system in the US is fundamentally broken in ways that will require much more significant reform than is being considered now.

The myth of the patent is the following: Patents are a mechanism for inventors to be compensated according to the value of their inventions while making the invention available to all. This myth sounds pretty desirable, but the reality is a strange distortion slowly leading towards collapse.

There are many problems associated with patents, but I would like to focus on just two of them:

  1. Patent Trolls The way that patents have generally worked over the last several decades is that they were a tool of large companies. Large companies would amass a large number of patents and then cross-license each other’s patents—in effect saying “we agree to owe each other nothing”. Smaller companies would sometimes lose in this game, essentially because they didn’t have enough patents to convince the larger companies that cross-licensing was a good idea. However, they didn’t necessarily lose, because small companies are also doing fewer things which makes their patent violation profile smaller.

    The patent trolls arrived recently. They are a new breed of company which does nothing but produce patents and collect money from them. The thing which distinguishes patent troll companies is that they have no patent violation profile. A company with a large number of patents can not credibly threaten to sue them unless they cross-license their patents, because they don’t do anything which violates a patent.

    The best example (and proof that this method works) is NTP, which extracted $612.5M from RIM. Although this is the big case with lots of publicity, the process of extracting money goes on constantly all around your in backroom negotiations with the companies that actually do things. In effect, patent trolls impose an invisible tax on companies that do things by companies that don’t. Restated in another way, patent trolls are akin to exploiting tax loopholes—except they exploit the law to make money rather than simply to avoid losing it. Smaller companies are particularly prone to lose, because they simply can not afford the extreme legal fees associated with fighting even a winning battle, but even large companies are also vulnerable to a patent troll.

    The other side of this argument is that patent trolls are simply performing a useful business function: employing researchers to come up with ideas or (at least) putting a floor on the value of ideas which they buy up through patents. Unfortunately, this is simply not true in my experience, due to the next problem.

  2. Combinatorial Ideas Patents are too easy. In fact, the process of coming up with patentable ideas is simply a matter of combinatorial application of existing ideas. This is a simple game that any reasonably intelligent person can play: you take idea 1 and idea 2, glue them together in any reasonable way, and get a patent.

    There are several reasons why the combinatorial application of existing ideas has become standard for patents.

    1. One of these is regulatory capture. It should surprise no one that the patent office, which gets paid for every patent application, has found a way to increase the number of patent applications.
    2. Another reason has to do with the way that patent law developed. Initially, patents were for processes doing things, rather than ideas. The scope of patents has steadily extended over time but the basic idea of patenting a process has been preserved. The fundamental problem is that processes can be created by the combinatorial application of ideas.

    The ease of patents is fundamentally valuable to patent troll type companies because they can acquire a large number of patents on processes which other companies accidentally violate.

The patent apocalypse happens when we project forward these two trends. Patents become ever easier to acquire and patent troll companies become ever more prevalent. In the end, every company which does something uses some obvious process that violates someone’s patent, and they have to pay at rates the patent owner chooses. There is no inherent bound on the number of patent troll type companies which can exist—they can multiply unchecked and drain money from every other company which does things until the system collapses.

I would like to make some positive suggestions here about how to reform the patent system, but it’s a hard mechanism design problem. Some obvious points are:

  1. Patents should have higher standards than research papers rather than substantially lower standards.
  2. The patent office should not make money from patents (this is not equivalent to saying that the patent applications should not be charged).
  3. The decision in whether or not to grant a patent should weigh the cost of granting. Many private property afficionados think “patents are great, because they compensate inventors”, but there is a real cost to society in granting obvious patents. You block people from doing things in the straightforward way or create hidden liabilities.
  4. Patents should be far more readable if the “available for all” part of the myth is to be upheld. Published academic papers are substantially more readable (which isn’t all that high of a bar).

Patent troll companies have found a clever way to exhibit the flaws in the current patent system. Substantial patent reform to eliminate this style of company would benefit just about everyone, except for these companies.

Videolectures.net

Davor has been working to setup videolectures.net which is the new site for the many lectures mentioned here. (Tragically, they seem to only be available in windows media format.) I went through my own projects and added a few links to the videos. The day when every result is a set of {paper, slides, video} isn’t quite here yet, but it’s within sight. (For many papers, of course, code is a 4th component.)

What to do with an unreasonable conditional accept

Last year about this time, we received a conditional accept for the searn paper, which asked us to reference a paper that was not reasonable to cite because there was strictly more relevant work by the same authors that we already cited. We wrote a response explaining this, and didn’t cite it in the final draft, giving the SPC an excuse to reject the paper, leading to unhappiness for all.

Later, Sanjoy Dasgupta suggested that an alternative was to talk to the PC chair instead, as soon as you see that a conditional accept is unreasonable. William Cohen and I spoke about this by email, the relevant bit of which is:

If an SPC asks for a revision that is inappropriate, the correct
action is to contact the chairs as soon as the decision is made,
clearly explaining what the problem is, so we can decide whether or
not to over-rule the SPC. As you say, this is extra work for us
chairs, but that’s part of the job, and we’re willing to do that sort
of work to improve the overall quality of the reviewing process and
the conference. In short, Sanjoy was right.

At the time, I operated under the belief that the PC chair’s job was simply too heavy to bother with something like this, but that was wrong. William invited me to post this, and I hope we all learn a little bit from it. Obviously, this should only be used if there is a real flaw in the conditions for a conditional accept paper.

Contextual Scaling

Machine learning has a new kind of “scaling to larger problems” to worry about: scaling with the amount of contextual information. The standard development path for a machine learning application in practice seems to be the following:

  1. Marginal. In the beginning, there was “majority vote”. At this stage, it isn’t necessary to understand that you have a prediction problem. People just realize that one answer is right sometimes and another answer other times. In machine learning terms, this corresponds to making a prediction without side information.
  2. First context. A clever person realizes that some bit of information x1 could be helpful. If x1 is discrete, they condition on it and make a predictor h(x1), typically by counting. If they are clever, then they also do some smoothing. If x1 is some real valued parameter, it’s very common to make a threshold cutoff. Often, these tasks are simply done by hand.
  3. Second. Another clever person (or perhaps the same one) realizes that some other bit of information x2 could be helpful. As long as the space of (x1, x2) remains small and discrete, they continue to form a predictor by counting. When (x1, x2) are real valued, the space remains visualizable, and so a hand crafted decision boundary works fine.
  4. The previous step repeats for information x3,…,x100. It’s no longer possible to visualize the data but a human can still function as a learning algorithm by carefully tweaking parameters and testing with the right software support to learn h(x1,…,x100). Graphical models can sometimes help scale up counting based approaches. Overfitting becomes a very serious issue. The “human learning algorithm” approach starts breaking down, because it becomes hard to integrate new information sources in the context of all others.
  5. Automation. People realize “we must automate this process of including new information to keep up”, and a learning algorithm is adopted. The precise choice is dependent on the characteristics of the learning problem (How many examples are there at training time? Is this online or batch? How fast must it be at test time?) and the familiarity of the people involved. This can be a real breakthrough—automation can greatly ease the inclusion of new information, and sometimes it can even improve results given the limited original information.

Understanding the process of contextual scaling seems particularly helpful for teaching about machine learning. It’s often the case that the switch to the last step could and should have happened before the the 100th bit of information was integrated.

We can also judge learning algorithms according to their ease of contextual scaling. In order from “least” to “most”, we might have:

  1. Counting based approaches. Number of examples required is generally exponential in the number of features.
  2. Counting based approaches with smoothing. Still exponential, but with saner defaults.
  3. Counting based approaches with smoothing and some prior language (graphical models, bayes nets, etc…). Number of examples required is no longer exponential, but can still be intractably large. Prior specification from a human is required.
  4. Prior based systems (Many Bayesian Learning algorithms). No particular number of examples required, but sane prior specification from a human may be required.
  5. Similarity based systems (nearest neighbor, kernel based algorithms). A similarity measure is a weaker form of prior information which can be substantially easier to specify.
  6. Highly automated approaches. “Just throw the new information as a feature and let the learning algorithms sort it out”.

At each step in this order, less effort is required to integrate new information.

Designing a learning algorithm which can be useful in many different contextual scales is fundamentally challenging. Obviously, when specific prior information is available, we want to incorporate it. Equally obviously, when specific prior information is not available, we want to be able to take advantage of new information which happens to be easily useful. When we have so much information that counting could work, a learning algorithm should behave similar to counting (with smoothing).