Best Practices for Collaboration – Machine Learning (Theory)

Many people, especially students, haven’t had an opportunity to collaborate with other researchers. Collaboration, especially with remote people can be tricky. Here are some observations of what has worked for me on collaborations involving a few people.

Travel and Discuss Almost all collaborations start with in-person discussion. This implies that travel is often necessary. We can hope that in the future we’ll have better systems for starting collaborations remotely (such as blogs), but we aren’t quite there yet.
Enable your collaborator. A collaboration can fall apart because one collaborator disables another. This sounds stupid (and it is), but it’s far easier than you might think.
1. Avoid Duplication. Discovering that you and a collaborator have been editing the same thing and now need to waste time reconciling changes is annoying. The best way to avoid this to be explicit about who has write permission to what. Most of the time, a write lock is held for the entire document, just to be sure.
2. Don’t keep the write lock unnecessarily. Some people are perfectionists so they have a real problem giving up the write lock on a draft until it is perfect. This prevents other collaborators from doing things. Releasing write lock (at least) when you sleep, is a good idea.
3. Send all necessary materials. Some people try to save space or bandwidth by not passing ‘.bib’ files or other auxiliary components. Forcing your collaborator to deal with the missing subdocument problem is disabling. Space and bandwidth are cheap while your collaborators time is precious. (Sending may be pass-by-reference rather than attach-to-message in most cases.)
4. Version Control. This doesn’t mean “use version control software”, although that’s fine. Instead, it means: have a version number for drafts passed back and forth. This means you can talk about “draft 3” rather than “the draft that was passed last tuesday”. Coupled with “send all necessary materials”, this implies that you naturally backup previous work.
Be Generous. It’s common for people to feel insecure about what they have done or how much “credit” they should get.
1. Coauthor standing. When deciding who should have a chance to be a coauthor, the rule should be “anyone who has helped produce a result conditioned on previous work”. “Helped produce” is often interpreted too narrowly—a theoretician should be generous about crediting experimental results and vice-versa. Potential coauthors may decline (and senior ones often do so). Control over who is a coauthor is best (and most naturally) exercised by the choice of who you talk to.
2. Author ordering. Author ordering is the wrong thing to worry about, so don’t. The CS theory community has a substantial advantage here because they default to alpha-by-author ordering, as is understood by everyone.
3. Who presents. A good default for presentations at a conference is “student presents” (or suitable generalizations). This gives young people a real chance to get involved and learn how things are done. Senior collaborators already have plentiful alternative methods to present research at workshops or invited talks.
Communicate by default Not cc’ing a collaborator is a bad idea. Even if you have a very specific question for one collaborator and not another, it’s a good idea to cc everyone. In the worst case, this is a few-second annoyance for the other collaborator. In the best case, the exchange answers unasked questions. This also prevents “conversation shifts into subjects interesting to everyone, but oops! you weren’t cced” problem.

These practices are imperfectly followed even by me, but they are a good ideal to strive for.

7 Replies to “Best Practices for Collaboration”

Very nice suggestions! Thanks!

2.2, 2.3 and 2.4 are all solved if you DO use a versioning system though! I really cannot recommend using subversion (or similar) enough, at first it might seem complicated, but it’ll pay you back 1000 times! Recently I had to work on reports where collaboration was emailing word documents titled blah-v0.1.doc etc back and forth, and it’s a nightmare.

Just as any machine learning model, this does consider all the researchers (read collaborators? ) i.i.d. from a distribution that is independent of all the “human” vices which more often than less is not the scenario…any suggestions in this case:) ?

seems you are a nice person to coauthor with

i think the most important thing is to acknowledge issues as mentioned in #3, not that you should start your collaboration with always discussing things like that, but it does not hurt either. It’s also OK if the first author is the one with the biggest contribution, but it seems that there is still A LOT of practice of placing the supervisor (strange as it might seem, also a translator!) before the one who made the contribution plus also not giving the student a chance to present.

why i emphasized 3rd the most – if there is a good match between personalities, other issues are all resolved eventually in time.

I second the recommendation that subversion, or a similar system, be used for version control. It’s worth forcing your coauthor to install it (TortoiseSVN for Windows is just fine, under Linux KDE I recommend kdesvn), they’ll be grateful to you.

For those computer scientists who are not real computer scientists and do not know how to install a subversion server, it would be cool to have an easy-to-use web interface somewhere. Does one exist already?

I also use Subversion for all my papers (except in cases when coauthors don’t have accounts on the same server), and used to use CVS for that. I can’t imagine emailing around all the tex files when different people are all editing the paper, adding figures, etc.

Having long phone meetings is another difficult trick. A few things I’ve learned: buy a wireless headset. When you have many people in the call in one room, it’s easy for the person on the other end to get behind or to miss what the people in the same room are saying, so it’s good to be mindful. In some cases, it’s better for the person in the other room to let the conversation go on without them.

Pingback: Meilleures pratiques de collaboration pour la recherche at #doesNotUnderstand:

Comments are closed.