Machine Learning (Theory)


Yes , I am applying

Tags: General jl@ 6:43 pm

Every year about now hundreds of applicants apply for a research/teaching job with the timing governed by the university recruitment schedule. This time, it’s my turn—the hat’s in the ring, I am a contender, etc… What I have heard is that this year is good in both directions—both an increased supply and an increased demand for machine learning expertise.

I consider this post a bit of an abuse as it is neither about general research nor machine learning. Please forgive me this once.

My hope is that I will learn about new places interested in funding basic research—it’s easy to imagine that I have overlooked possibilities.

I am not dogmatic about where I end up in any particular way. Several earlier posts detail what I think of as a good research environment, so I will avoid a repeat. A few more details seem important:

  1. Application. There is often a tension between basic research and immediate application. This tension is not as strong as might be expected in my case. As evidence, many of my coauthors of the last few years are trying to solve particular learning problems and I strongly care about whether and where a learning theory is useful in practice.
  2. Duration. I would like my next move to be of indefinite duration.

Feel free to email me ( if there is a possibility you think I should consider.


Machine Learning Thoughts

Tags: General jl@ 12:04 am

I added a link to Olivier Bousquet’s machine learning thoughts blog. Several of the posts may be of interest.


Is the Google way the way for machine learning?

Tags: General jl@ 5:36 pm

Urs Hoelzle from Google gave an invited presentation at NIPS. In the presentation, he strongly advocates interacting with data in a particular scalable manner which is something like the following:

  1. Make a cluster of machines.
  2. Build a unified filesystem. (Google uses GFS, but NFS or other approaches work reasonably well for smaller clusters.)
  3. Interact with data via MapReduce.

Creating a cluster of machines is, by this point, relatively straightforward.

Unified filesystems are a little bit tricky—GFS is capable by design of essentially unlimited speed throughput to disk. NFS can bottleneck because all of the data has to move through one machine. Nevertheless, this may not be a limiting factor for smaller clusters.

MapReduce is a programming paradigm. Essentially, it is a combination of a data element transform (map) and an agreggator/selector (reduce). These operations are highly parallelizable and the claim is that they support the forms of data interaction which are necessary.
Apparently, the Nutch project has an open source implementation of mapreduce (but this is clearly the most nonstandard element).

Shifting towards this paradigm has several effects:

  1. It makes “big data” applications more viable.
  2. It makes some learning algorithms more viable than others. One way to think about this is in terms of statistical query learning algorithms. The (generalized) notion of statistical query algorithms is algorithms that rely upon only the results of expections of a (relatively small) number of functions. Any such algorithm can be implemented via mapreduce. The “naive bayes” algorithm and most decision tree algorithms can be easily phrased as statistical query algorithms. Support vector machines can (technically) be phrased as statistical query algorithms, but the number of queries scales with the number of datapoints. Gradient descent algorithms can also be phrased as statistical query algorithms. Learning algorithms which work on one example at a time are not generally statistical query algorithms.

    Another way to think about this is in terms of the complexity of the computation. Roughly speaking, as the amount of data scales, only O(n) or (perhaps) O(n log(n)) algorithms are tractable. This strongly favors online learning algorithms. Decision trees and naive bayes are (again) relatively reasonable. Support vector machines (or gaussian processes) encounter difficulties related to scaling.

There is a reasonable argument that the “low hanging fruit” of machine learning research is in the big data with enabling tools paradigm. This is because (a) the amount of data available has been growing far faster than the amount of computation and (b) we just haven’t had the tools to scale here, until recently.

I expect Urs is right: we should look in this direction.


The Webscience Future

Tags: General jl@ 10:47 am

The internet has significantly effected the way we do research but it’s capabilities have not yet been fully realized.

First, let’s acknowledge some known effects.

  1. Self-publishing By default, all researchers in machine learning (and more generally computer science and physics) place their papers online for anyone to download. The exact mechanism differs—physicists tend to use a central repository (Arxiv) while computer scientists tend to place the papers on their webpage. Arxiv has been slowly growing in subject breadth so it now sometimes used by computer scientists.
  2. Collaboration Email has enabled working remotely with coauthors. This has allowed collaborationis which would not otherwise have been possible and generally speeds research.

Now, let’s look at attempts to go further.

  1. Blogs (like this one) allow public discussion about topics which are not easily categorized as “a new idea in machine learning” (like this topic).
  2. Organization of some subfield of research. This includes Satinder Singh’s Reinforcement Learning pages, and, more generally books that have been placed online such as this one.
  3. Discussion Groups The kernel machines discussions provide a good example of some common format allowing discussion.
  4. Class notes have been placed online such as Avrim’s learning theory lecture notes.
  5. Wikipedia has an article on Machine Learning. The article gives a reasonable quick overview and is surely read by a very large number of people.
  6. Online Proceedings are now being used by several conferences such as NIPS.

Now, let’s consider some futures.

  1. Wikifuture Wikipedia becomes better to the point where it is a completely comprehensive listing of current research in machine learning. At some point, we-the-community realize this and begin to emphasize (and credit) information placed in wikipedia. This process reinforces itself to the point where “if it’s not in wikipedia, it doesn’t exist”.

    This future is significantly more probable than most people understand. As evidence compare the machine learning page three years ago (yep, it didn’t exist), two years ago, one year ago, and today. That progression strongly suggests that wikipedia:machine learning will continue to grow into a significant information resource.

    There are fundamental obstacles to the success of the wikipedia future.

    1. credit Wikipedia has only very weak mechanisms for crediting editors. A list of the changes done by one user account is about as much credit as is available. This is not enough to make career-deciding questions on. We could hope for a stronger link between identity and editor along with tools to track the value of particular edits (Think of counting hyperlinks as an analogue for counting citations).
    2. controversy Wikipedia has grown up in a nonmanipulative environment. When it was little known, the incentive to fabricate entries was not great. Now that it is becoming well known that incentive is growing. Character assasination by false article exists. In science, the thing to worry about is misplaced ideas of the importance of your topic of research since it is very difficult to be sufficiently interested in a research topic and simultaneously view it objectively. Research is about creating new ideas, and the location of these ideas in some general organization is in dispute by default.
  2. Evolutionary Progression Consider the following sequence of steps.
    1. Conference Organization We realize that having a list of online papers isn’t nearly as useful as having an organized list of online papers so the conferences which already have online proceedings create an explorable topic hierarchy.
    2. Time Organization We realize that the organization at one particular year’s conference is sketchy—research is a multiyear endeavor. Consequently, we start adding to last years topic hierarchy rather than creating a new one from scratch each year.
    3. Transformation We realize that it is better if papers are done in the language of the web. For example, it’s very handy to be able to hyperlink inside of a paper. A good solution to the math on the web problem would greatly help here.
    4. Consolidation We realize that there is a lot of redundancy in two papers on the same or a similar topic. They share an introduction, motivation, and (often) definitions. By joining the shared pieces, the contents of both papers can be made clearer.

    Each of these individual steps clearly yields something better. At the end of these steps, creating a paper is simply the process of creating a webpage or altering an existing webpage. We can imagine doing all of this while keeping the peer-review mechanisms of science intact, so the resulting process is simply better in all ways. It’s easier to author because for most papers much of the “filler” introduction/motivation/definition can be reused from previous papers. It’s easier to review, because reviewers can consider the result in context. Much of the difficulty of reviewing is simply due to the author and reviewer not being “on the same page” in how they understand things. An organized topic hierarchy greatly aids this.

  3. The unknown It is difficult to anticipate everything. What other futures might exist?

Which future comes about is dependent on many things—the decisions of community leaders, enabling ‘math-on-the-web’ technologies, etc…, so it is difficult to predict which future and when it will come about. Nevertheless, a potential exists and there are several paths leading towards reaching that potential.


The Design of an Optimal Research Environment

Tags: General jl@ 10:04 am

How do you create an optimal environment for research? Here are some essential ingredients that I see.

  1. Stability. University-based research is relatively good at this. On any particular day, researchers face choices in what they will work on. A very common tradeoff is between:
    1. easy small
    2. difficult big

    For researchers without stability, the ‘easy small’ option wins. This is often “ok”—a series of incremental improvements on the state of the art can add up to something very beneficial. However, it misses one of the big potentials of research: finding entirely new and better ways of doing things.

    Stability comes in many forms. The prototypical example is tenure at a university—a tenured professor is almost imposssible to fire which means that the professor has the freedom to consider far horizon activities. An iron-clad guarantee of a paycheck is not necessary—industrial research labs have succeeded well with research positions of indefinite duration. Atnt research was a great example of this before they lost their monopoly and had to fire the researchers.

    An inadequate amount of stability is probably given by fixed duration appointments. The appropriate timescale to consider here is one year because there is a strong annual cycle for application. A one year appointment is probably too short while a 10 year appointment may be effectively the same as arbitrary duration.

    One significant part of stability is financial stability of the parent organization. The nature of research implies that what makes ‘funding sense’ depends on the size of the set of people who can benefit from it. Since ‘governments’ are the largest organizations around, they are good candidates, with the caveat that political will has not been known as greatly stable.

  2. Free time. University-based research is relatively terrible about this while industrial labs vary widely.

    For professors at a university, teaching a subject well requires very significant time and energy. Running the university requires very significant time and energy. Writing grant proposals for funding requires very significant time and energy (success in grant writing is typically a necessity to achieve tenure at US research universities). Advising students well requires significant time and energy. Some of these activities are partially synergistic with research, but it seems unarguable that the multiplicity of time demands can greatly slow research.

    In industrial labs (think of “IBM research” as an example), there is an tendency to shift research towards short term rewards. This is partly becaue the research labs have not proved very capable of using the big successes and partly because even large businesses have ups and downs. During a ‘down’ time, it is very tempting to use the reserve of very capable people in a research lab for short duration projects.

    One thing to understand about research is that it is not the case that a person with half as much free time can produce half as much work. The problems encountered might have sufficient intricacy that they require a day or more of overhead to begin making progress.

  3. Problem exposure The set of research problems which people could work on is much larger than the set of research problems which are plausibly useful to the rest of the world (*). The way narrow down on the relevant problems is to give researchers a chance to encounter them.

    Univeristy-based research can have problems here, because no really effective mechanism for doing this exists by default. In industrial labs, researchers often act as “superconsultants” helping to solve problems. This method is effective but perhaps relatively crude because as often instantiated, it conflicts with “free time”.

    (*) There is always someone who says “but at the time number theory was researched, nobody had any practical use, and now it is very useful for crypto”. This is true, but there are several important caveats:

    1. The “hit rate” (in terms of big impact on everyones lives) for unmotivated research is much lower. I don’t have a good measurement of how much lower, but “significantly” seems both plausibly and anecdotally correct.
    2. Number theory was not entirely unmotivated at the time. Afterall, it was clear that numbers were a very important abstraction and so a better understanding of numbers was plausibly useful.
    3. Would development of factoring algorithms and their hardness understanding have been much delayed without number theory? And would that delay have been worth, in exchange, an earlier development of calculus? These are at least hard questions to answer.

    There is a role for research of the “let’s do it because I am curious nature”, especially because interested people do good research. The size of that role must, necessarily, be relatively small.

  4. Benefit exposure Almost all researchers are not well-exposed to the benefits of research. University-based research is may do this best via giving the researchers the ability to form their own companies with partial ownership. However, the overhead associated with “form your own company” is very substantial creating a formidable barrier (some good news, is that the barrier may be decreasing). This fails for many research advances which are obviously useful yet have no reasonable mechanism of capturing it’s profit. In the context of machine learning, spam filtering is of obvious and significant common use, yet a researcher with a better spam filter can not easily create a company around the method. There are several reasons for this.
    1. Machine learning is a victim of it’s common success. It’s hard to develop a learning algorithm which is substantially better than others. This means that anyone wanting to implement spam filtering can do so. Patents are useless here—you can’t patent an entire field (and even if you could it wouldn’t work).
    2. Spam filtering is not easily modularizable. Most people want their email clients to filter spam, but they don’t want to switch email clients. Combined with the first observation, this makes it clear that a “spam filtering company” is going to run into trouble.

    The benefit structure in industrial research labs is (perhaps) often even worse. The primary mechanism is via bonuses, where 10% of base salary might be large. Obviously, keeping the job is of much greater importance. A system where the base salary is bumped by a small (variable) amount each year is common. The implication of this is that rewards are delayed (reducing incentives) compared to a bonus and age discrimination becomes a problem.

    A relative bright spot here is google. Much of what they work on is not research-as-we-know-it, but some of it is, and there seem to be significant mechanisms for rewarding success, even after the IPO.

  5. Devolution of power There is a traditional notion of how things are done which says “everyone must have a manager to tell them what to do”. This works badly in research for several reasons.
    1. People are simply not very good at doing research on a topic which does not interest them. The problem is hard enough that doing it in a disinterested or even not-fully-interested way greatly reduces the chance of success.
    2. Another difficulty here is judgement. People make mistakes, including even managers. If a manager has a good idea about what is an interesting topic of research, then the power to say “work on this” is mildly beneficial over simple persuasion. On the other hand, a mistake is greatly harmful.

    The alternative to a command-and-control notion of research is a persuasion-based system. In a persuasion-based system, people are free to either work with someone or not, as they judge. In a persuasion based system, no compelling idea means that researchers try different problems while, which seems appropriate. And, when something promising comes along, more people can be added as demand and persuasiveness allows. University style research partially achieves this (a notable complaint of students is that they do not have enough freedom). In industrial labs, this varies widely.

    It is worth noting that devolution of power has occurred in all of the following surprisingly effective organizations/endeavors:

    1. Linux kernel development (and more generally open source software). Only a decade ago, most economists would have considered the present success of open source software development as implausible. Anyone who can code can work on the linux kernel if they want.
    2. Wikipedia Traditionaly encyclopedias complain that wikipedia articles can be false since anyone can submit, but this argument vastly underestimates the value of having a comprehensive and uptodate information resource.
    3. Google made it so anyone could advertise in a simple and easy way. This simultaneously lowered the barrier to advertising and sanitized the process. We sometimes forget that google came long after other internet search engines so their success was far from assured.
  6. Student exposure By “student” here, I mean “research student”. Research students are of course much more common on university campuses, although there are sometimes government funded research labs such as MPI in Germany and NICTA in Australia.

    Any reasonable system involving people working together must take into account how new people are brought into the system. The appropriate point seems to be where training in research starts (typically, ‘phd students’). Aside from issues of a self-sustaining system, students can add something significant to the process of research. The effort of explaining a new topic of research often aids research via simplification and consideration in new contexts. Students of course can also contribute substantially to the basic mission of research because they inherently bring in new ideas.

  7. Concentration There are several kinds of concentration, and each of them is important. The first kind is the ‘quality’ kind: there are better and worse researchers. Unfortunately, this is extraordinarily difficult to judge, except after the fact. Another kind is concentration of interests. Having 5 people each interested in the subject working on their own is often substantially less useful than having the 5 working together. The internet has greatly aided concentration because some significant benefit can be derived even from people who are not physically near to each other. Universities have traditionally had difficulty with this because the demands of teaching undergraduates imply a necessity for breadth of subjects. This implies that it is common for universities to not have two people working on a similar subject.

It is interesting that no place manages to capture them all. If all of these elements were to come together, the result might be very impressive.

« Newer PostsOlder Posts »

Powered by WordPress