The spock challenge for named entity recognition was won by Berno Stein, Sven Eissen, Tino Rub, Hagen Tonnies, Christof Braeutigam, and Martin Potthast.
For those interested I’ve set up a page describing our approach
That’s great—I couldn’t find it looking around the Spock site.
I find it it interesting that in contrast to the Netflix challenge where the winner combined many different approaches, your approach is more coherent. Would averaging over approaches of other contestants yield even better results?
Well that’s of course a good question. The results of other approaches are also clusterings, and there are some interesting approaches to combine clusterings. Indeed we tried a modification of this algorithm by Fred and Jain, which essentialy counts for each pair of items i,j how often they appear together in a cluster (over all clusterings). The outcome is a matrix that can be interpreted as an adjacency matrix of a weighted graph. This graph in turn can be clustered to receive the final ensemble clustering. This combination of “evidences” proved to be powerful – it delivered good results for the combination of the clusterings of the individual models. However, combining the models via logistic regression (before clustering) had some advantages in our case: the weights for the individual models could be estimated from the 25,000 training documents, leading to better F-measure values. Nevertheless, it would be worth trying to combine results.
Powered by WordPress