A couple security researchers claim to have cracked the netflix dataset. The claims of success appear somewhat overstated to me, but the method of attack is valid and could plausibly be substantially improved so as to reveal the movie preferences of a small fraction of Netflix users.
The basic idea is to use a heuristic similarity function between ratings in a public database (from IMDB) and an anonymized database (Netflix) to link ratings in the private database to public identities (in IMDB). They claim to have linked two of a few dozen IMDB users to anonymized netflix users.
The claims seem a bit inflated to me, because (a) knowing the IMDB identity isn’t equivalent to knowing the person and (b) the claims of statistical significance are with respect to a model of the world they created (rather than one they created).
Overall, this is another example showing that complete privacy is hard. It may be worth remembering that there are some substantial benefits from the Netflix challenge as well—we (as a society) have learned something about how to do collaborative filtering which is useful beyond just recommending movies.