Apparently, the company Spock is setting up a $50k entity resolution challenge. $50k is much less than the Netflix challenge, but it’s effectively the same as Netflix until someone reaches 10%. It’s also nice that the Spock challenge has a short duration. The (visible) test set is of size 25k and the training set has size 75k.
$1M Netflix prediction contest
Netflix is running a contest to improve recommender prediction systems. A 10% improvement over their current system yields a $1M prize. Failing that, the best smaller improvement yields a smaller $50K prize. This contest looks quite real, and the $50K prize money is almost certainly achievable with a bit of thought. The contest also comes with a dataset which is apparently 2 orders of magnitude larger than any other public recommendation system datasets.
Branch Prediction Competition
Alan Fern points out the second branch prediction challenge (due September 29) which is a follow up to the first branch prediction competition. Branch prediction is one of the fundamental learning problems of the computer age: without it our computers might run an order of magnitude slower. This is a tough problem since there are sharp constraints on time and space complexity in an online environment. For machine learning, the “idealistic track” may fit well. Essentially, they remove these constraints to gain a weak upper bound on what might be done.
Pittsburgh Mind Reading Competition
Francisco Pereira points out a fun Prediction Competition. Francisco says:
DARPA is sponsoring a competition to analyze data from an unusual functional Magnetic Resonance Imaging experiment. Subjects watch videos inside the scanner while fMRI data are acquired. Unbeknownst to these subjects, the videos have been seen by a panel of other subjects that labeled each instant with labels in categories such as representation (are there tools, body parts, motion, sound), location, presence of actors, emotional content, etc.
The challenge is to predict all of these different labels on an instant-by-instant basis from the fMRI data. A few reasons why this is particularly interesting:
- This is beyond the current state of the art, but not inconceivably hard.
- This is a new type of experiment design current analysis methods cannot deal with.
- This is an opportunity to work with a heavily examined and preprocessed neuroimaging dataset.
- DARPA is offering prizes!