Taking the next step – Machine Learning (Theory)

At the last ICML, Tom Dietterich asked me to look into systems for commenting on papers. I’ve been slow getting to this, but it’s relevant now.

The essential observation is that we now have many tools for online collaboration, but they are not yet much used in academic research. If we can find the right way to use them, then perhaps great things might happen, with extra kudos to the first conference that manages to really create an online community. Various conferences have been poking at this. For example, UAI has setup a wiki, COLT has started using Joomla, with some dynamic content, and AAAI has been setting up a “student blog“. Similarly, Dinoj Surendran setup a twiki for the Chicago Machine Learning Summer School, which was quite useful for coordinating events and other things.

I believe the most important thing is a willingness to experiment. A good place to start seems to be enhancing existing conference websites. For example, the ICML 2007 papers page is basically only useful via grep. A much more human-readable version of the page would organize the papers by topic. If the page wiki-editable, this would almost happen automatically. Adding the ability for people to comment on the papers might make the website more useful beyond the time of the conference itself.

There are several aspects of an experiment which seem intuitively important to me. I found the wikipatterns site a helpful distillation of many of these intuitions. Here are various concerns I have:

Mandate An official mandate is a must-have. Any such enhancement needs to be an official part of the website, or the hesitation to participate will probably be too much.
Permissive Comments Allowing anyone to comment on a website is somewhat scary to academics, because we are used to peer-reviewing papers before publishing. Nevertheless, it seems important to not strongly filter comments, because:
1. The added (human) work of filtering is burdensome.
2. The delay introduced acts as a barrier to participation.
The policy I’ve followed on hunch.net is allowing comments from anyone exhibiting evidence of intelligence—i.e. filtering essentially only robots. This worked as well I hoped, and not as badly as I feared.
Spam Spam is a serious issue for dynamic websites, because it adds substantially to the maintenance load. There are basically two tacks to take here:
1. Issue a userid/passwd to every conference registrant (and maybe others that request it), the just allow comments from them.
2. Allow comments from anyone, but use automated filters. I’ve been using Akismet, but recaptcha is also cool.
I favor the second approach, because it’s more permissive, and it makes participation easier. However, it may increase the maintenance workload.
Someone Someone to shepard the experiment is needed. I’m personally overloaded with other things at the moment (witness the slow post rate), so I don’t have significant time to devote. Nevertheless, I’m sure there are many people in the community with as good a familiarity with the internet and web applications as myself.
Software Choice I don’t have strong preferences for the precise choice of software, but some guidelines seem good.
1. Open Source I have a strong preference for open source solutions, of which there appear to be several reasonable choices. The reason is that open source applications leave you free (or at least freer) to switch and change things, which seems essential when experimenting.
2. Large User base When going with an open source solution, something with a large user base is likely to have fewer rough edges.
I have some preference for systems using flat files for datastorage rather than a database because they are easier to maintain or (if necessary) operate on. This is partly due to a bad experience I had with the twiki setup for MLSS—basically an attempt to transfer data to an upgraded mysql failed because of schema issues I failed to resolve.

I’m sure there are many with more experience using wiki and comment systems—perhaps they can comment on exact software choices. Wikimatrix seems to provide frighteningly detailed comparisons of different wiki software.

7 Replies to “Taking the next step”

I don’t fully understand the proposal. Are you proposing a system for public comment on paper submissions, i.e. before acceptance, or after acceptance? The latter seems useful but uncontroversial. The former sounds like an excellent idea to me, but also seems to involve thornier issues than you raise, e.g. how to ensure that ideas from rejected submissions aren’t stolen, and how to incentivize people to comment.

This looks like a great outline — I think the “Someone” could be an intern or graduate student perhaps? I’m sure that sponsorship should be pretty easy to find.

I’m definitely thinking about after acceptance. Trying to change the way papers are reviewed is to big a leap. Let’s first learn how to use the tools.

Sounds like a good idea. I would like to see that implemented at least as a test.

misha b

I’ve been using CiteULike recently and noticed that it allows its users to set up groups (I’ve started one called Statistical Machine Learning to try it out). Within these groups, members can post messages, start forum threads and write in a group blog. Access and membership permissions is very configurable and because it is a social bibliography site all the bibliographic infrastructure is there for free.

Piggy-backing on this site by creating a group and starting forum topics for each paper might be a way to trial a system like the one you propose. As far as I can tell it would meet most of your criteria. Even if it falls short, the process of using it has a low overhead and may refine your requirements for the next iteration. A “someone” will still be required to manage the group and memberships.

An alternative tool might be wikidot which I’ve also been using for several months. It offers free (public and private) wiki hosting with built in support for LaTeX mark-up, discussion forums and access privileges. Some simple web scripts could be written to create a wiki page per article (in JMLR, say) and users could collaborate on a summary of the important results in the paper and links to other papers while keeping the discussion in the per-page discussion forums that are offered by default.

If, later, you do not wish to rely on wikidot’s hosting or you need machine learning specific features you could grab the source for wikidot and host it yourself. Once again, this appears to be (in the worst case) a cheap way to refine requirements or (in the best case) a solution.

I’d be happy to dedicate (a small) part of my time to helping out with getting this set up.

Pingback: science 2.0 (si, aunque no les guste a muchos) « Descubriendo a MonaLisa

Pingback: ICML Discussion Site < Inductio Ex Machina

Comments are closed.