Conference on Digitial Experimentation

I just attended CODE. The set of people interested in digital experimentation have very diverse backgrounds encompassing theory, machine learning, social science, economics, and industry so this seems like a good subject for a new conference. I hope it continues.

I found several talks interesting.

  • Eytan Bakshy talked about PlanOut which is language/platform for flexibly specifying experiments.
  • Ron Kohavi talked about EXP which is a heavily used A/B testing platform.
  • Susan Athey talked about long term vs short term metrics which seems both important to address, a constant problem, and not yet systematically solved.

There was a panel about the ongoing Facebook experimentation controversy. The issue here is complex. My understanding is that Facebook users have some expected ownership of the content they create, and hence aren’t comfortable with the content being used in unexpected ways. On the other hand, experimentation is so necessary to the functioning of all large modern internet sites that banning it or slowing down the process by a factor of a million (as some advocated) would badly degrade the future of these sites in practice.

My belief is that what’s lacking is education and trust. W.r.t. education, people need to understand that experimentation is unavoidable when trying to figure out how to optimize an enormously complex system, as there is just no other way to systematically make 1000 right decisions as is necessary for basic things like choosing the best homepage/search result/etc… W.r.t. trust, companies are not particularly good at creating trust in general, but finding the right mechanism for doing so seems critical. I would point out Vanguard as a company that managed to successfully create trust by design.

Fall Machine Learning Events

Many Machine Learning related events are coming up this fall.

  1. September 9, abstracts for the New York Machine Learning Symposium are due. Send a 2 page pdf, if interested, and note that we:
    1. widened submissions to be from anybody rather than students.
    2. set aside a larger fraction of time for contributed submissions.
  2. September 15, there is a machine learning meetup, where I’ll be discussing terascale learning at AOL.
  3. September 16, there is a CS&Econ day at New York Academy of Sciences. This is not ML focused, but it’s easy to imagine interest.
  4. September 23 and later NIPS workshop submissions start coming due. As usual, there are too many good ones, so I won’t be able to attend all those that interest me. I do hope some workshop makers consider ICML this coming summer, as we are increasing to a 2 day format for you. Here are a few that interest me:
    1. Big Learning is about dealing with lots of data. Abstracts are due September 30.
    2. The Bayes Bandits workshop. Abstracts are due September 23.
    3. The Personalized Medicine workshop
    4. The Learning Semantics workshop. Abstracts are due September 26.
    5. The ML Relations workshop. Abstracts are due September 30.
    6. The Hierarchical Learning workshop. Challenge submissions are due October 17, and abstracts are due October 21.
    7. The Computational Tradeoffs workshop. Abstracts are due October 17.
    8. The Model Selection workshop. Abstracts are due September 24.
  5. October 16-17 is the Singularity Summit in New York. This is for the AIists and only peripherally about ML.
  6. October 16-21 is a Predictive Analytics World in New York. As machine learning goes industrial, we see industrial-style conferences rapidly developing.
  7. October 21, there is the New York ML Symposium. In addition to what’s there, Chris Wiggins is looking into setting up a session for startups and those interested in them to get to know each other, as last year.
  8. Decembr 16-17 NIPS workshops in Granada, Spain.

Centmail comments

Centmail is a scheme which makes charity donations have a secondary value, as a stamp for email. When discussed on newscientist, slashdot, and others, some of the comments make the academic review process appear thoughtful :). Some prominent fallacies are:

  1. Costing money fallacy. Some commenters appear to believe the system charges money per email. Instead, the basic idea is that users get an extra benefit from donations to a charity and participation is strictly voluntary. The solution to this fallacy is simply reading the details.
  2. Single solution fallacy. Some commenters seem to think this is proposed as a complete solution to spam, and since not everyone will opt to participate, it won’t work. But a complete solution is not at all necessary or even possible given the flag-day problem. Deployed machine learning systems for fighting spam are great at taking advantage of a partial solution. The solution to this fallacy is learning about machine learning. In the current state of affairs, informed comment about spam fighting without knowing machine learning is difficult to imagine.
  3. Broken crypto fallacy. Some commenters seem to think that stamps can be reused arbitrarily on emails. This ignores the existence of strong hashes. The solution to this fallacy is simply checking the details and possibly learning about cryptographics hashes.

Dan Reeves made a very detailed FAQ trying to address all the failure modes seen in comments, and there is a bit more discussion at messy matters.

My personal opinion is that Centmail is an interesting idea that might work, avoids the failure modes of many other ideas, hasn’t failed yet, and hence is worth trying. It’s a better approach than my earlier thoughts on economic solutions to spam.