CS – Machine Learning (Theory)

9/28/20119/28/2011

Somebody’s Eating Your Lunch

Since we last discussed the other online learning, Stanford has very visibly started pushing mass teaching in AI, Machine Learning, and Databases. In retrospect, it’s not too surprising that the next step up in serious online teaching experiments are occurring at the computer science department of a university embedded in the land of startups. Numbers on the order of 100000 are quite significant—similar in scale to the number of computer science undergraduate students/year in the US. Although these populations surely differ, the fact that they could overlap is worth considering for the future.

It’s too soon to say how successful these classes will be and there are many easy criticisms to make:

Registration != Learning … but if only 1/10th complete these classes, the scale of teaching still surpasses the scale of any traditional process.
1st year excitement != nth year routine … but if only 1/10th take future classes, the scale of teaching still surpasses the scale of any traditional process.
Hello, cheating … but teaching is much harder than testing in general, and we already have recognized systems for mass testing.
Online misses out … sure, but for students not enrolled in a high quality university program, this is simply not a relevant comparison. There are also benefits to being online as well, as your time might be better focused. Anecdotally, at Caltech, they let us take two classes at the same time, which I did a few times. Typically, I had a better grade in the class that I skipped as the instructor had to go through things rather slowly.
Where’s the beef? The hard nosed will want to know how to make money, which is always a concern. But, a decent expectation is that if you first figure out how to create value, you’ll find some way to make money. And, if you first wait until it’s clear how to make money, you won’t make any.

My belief is that this project will pan out, with allowances for the expected inevitable adjustments that you learn to make from experience. I think the critics miss an understanding of what’s possible and what motivates people.

The prospect of teaching 1 student means you might review some notes. The prospect of teaching ~10 students means you prepare some slides. The prospect of teaching ~100 students means you polish your slides well, trying to anticipate questions, and hopefully drawing on experience from previous presentations. I’ve never directly taught ~1000 students, but at that scale you must try very hard to make the presentation perfect, including serious testing with dry runs. 10⁵ students must make getting out of bed in the morning quite easy.

Stanford has a significant first-mover advantage amongst top research universities, but it’s easy to imagine a few other (but not many) universities operating at a similar scale. Those that have the foresight to start a serious online teaching program soon will have a chance of being among the few. For other research universities, we can expect boutique traditional classes to continue for some time. These boutique classes may have some significant social value, because it’s easy to imagine that the few megaclasses miss important things in developing research areas. And for everyone working at teaching universities, someone is eating your lunch.

(Cross posted at CACM.)

4/28/2010

CI Fellows program renewed

Lev Reyzin points out the CI Fellows program is renewed. CI Fellows are essentially NSF funded computer science postdocs for universities and industry research labs. I’ve been lucky and happy to have Lev visit me for a year under last year’s program, so I strongly recommend participating if it suits you.

As with last year, the application timeline is very short, with everything due by May 23.

8/3/2009

Carbon in Computer Science Research

Al Gore‘s film and gradually more assertive and thorough science has managed to mostly shift the debate on climate change from “Is it happening?” to “What should be done?” In that context, it’s worthwhile to think a bit about what can be done within computer science research.

There are two things we can think about:

Doing Research At a cartoon level, computer science research consists of some combination of commuting to&from work, writing programs, running them on computers, writing papers, and presenting them at conferences. A typical computer has a power usage on the order of 100 Watts, which works out to 2.4 kiloWatt-hours/day. Looking up David MacKay‘s reference on power usage per person, it becomes clear that this is a relatively minor part of the lifestyle, although it could become substantial if many more computers are required. Much larger costs are associated with commuting (which is in common with many people) and attending conferences. Since local commuting is common across many people, and there are known approaches (typically public transportation) for more efficient commuting, I expect researchers can piggyback on improvements in public transportation to reduce commuting costs. In fact, the situation for researchers may be better in general, as the nature of the job may make commuting avoidable, at least on some days.
Presenting at conferences is the remaining problem area, essentially due to travel by airplane to and from a conference. Travel by airplane has an energy cost similar to travel by car over the same distance, but we typically take airplanes for very long distances. Unlike cars, typical airplane usage requires stored energy in a dense form. For example, there are no serious proposals I’m aware of for battery-powered airplanes, because all existing rechargeable batteries have a power density around 1/10th that of hydrocarbon fuel (which makes sense given that about 3/4 of the mass for a hydrocarbon fire is oxygen in the air). This suggests airplane transport may be particularly difficult to adapt towards low or zero carbon usage. The plausible approaches I know involve either using electricity (from where?) to inefficiently crack water for hydrogen, or the biofuel approach where hydrocarbons are made by plants, with neither of these approaches particularly far along in development. If these aren’t developed, it seems we should expect fewer conferences, more regional conferences, Europe with it’s extensive fast train network to be less impacted, and more serious effort towards distributed conferences. For the last, it’s easy to imagine with existing technology having simultaneous regional conferences which are mutually videoconferenced, and we aren’t far from being able to handle a fully interactive videobroadcast amongst an indefinitely large number of participants. As a corollary of fewer conferences, other interactive mechanisms (for example research blogs) seems likely to grow.
Research Topics They keyword for research topics is efficiency, and it is not a trivial concern on a global scale. In computer science, there have been a few algorithms (such as quicksort and hashing) developed which substantially and broadly improved real-world efficiency, but the real driver of efficiency so far is the hardware development, which has phenomenally improved efficiency for several decades.
Many of the efficiency improvements are sure to remain hardware based, but software is becoming an essential component. One basic observation about efficient algorithms is that for problems admitting an efficient parallel solution (counting is a great example), the parallel algorithm is generally more efficient, because energy use is typically superlinear in clock speed. As an extreme example, the human brain which is deeply optimized by evolution for energy efficiency typically runs at at 100Hz or 100KHz.

Although efficiency suggests parallel algorithms, this should not be done blindly. For example, in machine learning the evidence I’ve seen so far suggests that online learning (which is admittedly harder to parallelize) is substantially more efficient than batch style learning, implying that I expect online approaches to be more efficient than map-reduce based machine learning as is typically seen in the Mahout project.

A substantial difficulty with parallel algorithms is the programming itself. In this regard, there is plenty of room for programming language work as well.

5/19/20096/8/2009

CI Fellows

Lev Reyzin points out the CI Fellows Project. Essentially, NSF is funding 60 postdocs in computer science for graduates from a wide array of US places to a wide array of US places. This is particularly welcome given a tough year for new hires. I expect some fraction of these postdocs will be in ML. The time frame is quite short, so those interested should look it over immediately.