<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Machine Learning (Theory)</title>
	<atom:link href="http://hunch.net/wp-rss2.php" rel="self" type="application/rss+xml" />
	<link>http://hunch.net</link>
	<description>Machine learning and learning theory research</description>
	<pubDate>Sat, 27 Dec 2008 19:07:50 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
	<language>en</language>
			<item>
		<title>Adversarial Academia</title>
		<link>http://hunch.net/?p=499</link>
		<comments>http://hunch.net/?p=499#comments</comments>
		<pubDate>Sat, 27 Dec 2008 19:07:50 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Conferences]]></category>

		<category><![CDATA[Funding]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<category><![CDATA[Research]]></category>

		<category><![CDATA[Reviewing
]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=499</guid>
		<description><![CDATA[One viewpoint on academia is that it is inherently adversarial: there are finite research dollars, positions, and students to work with, implying a zero-sum game between different participants.  This is not a viewpoint that I want to promote, as I consider it flawed.  However, I know several people believe strongly in this viewpoint, [...]]]></description>
			<content:encoded><![CDATA[<p>One viewpoint on academia is that it is inherently adversarial: there are finite research dollars, positions, and students to work with, implying a zero-sum game between different participants.  This is not a viewpoint that I want to promote, as I consider it flawed.  However, I know several people believe strongly in this viewpoint, and I have found it to have  substantial explanatory power.</p>
<p>For example:</p>
<ol>
<li>It explains why your paper was rejected based on poor logic.  The reviewer wasn&#8217;t concerned with research quality, but rather with rejecting a competitor.</li>
<li>It explains why professors rarely work together.  The goal of a non-tenured professor (at least) is to get tenure, and a case for tenure comes from a portfolio of work that is undisputably yours.</li>
<li>It explains why new research programs are not quickly adopted.  Adopting a competitor&#8217;s program is impossible, if your career is based on the competitor being wrong.</li>
</ol>
<p>Different academic groups subscribe to the adversarial viewpoint in different degrees.  In my experience, <a href="http://nips.cc/">NIPS</a> is the worst.  It is bad enough that the probability of a paper being accepted at NIPS is monotonically <i>decreasing</i> in it&#8217;s quality.  This is more than just my personal experience over a number of years, as it&#8217;s corroborated by others who have told me the same.  ICML (run by <a href="http://www.machinelearning.org/">IMLS</a>) used to have less of a problem, but since it has become more like NIPS over time, it has inherited this problem.  <a href="http://learningtheory.org/">COLT</a> has not suffered from this problem as much in my experience, although it had other problems related to the focus being defined too narrowly.  I do not have enough experience with UAI or KDD to comment there.</p>
<p>There are substantial flaws in the adversarial viewpoint.</p>
<ol>
<li>The adversarial viewpoint makes you stupid.  When viewed adversarially, any idea has crippling disadvantages and  no advantages.  Contorting your viewpoint enough to make this true damages your ability to conduct research.  In short, it promotes poor mental hygiene.</li>
<li>Many activities become impossible.  Doing research is in general extremely hard, so there are many instances where working with other people can allow you to do things which are otherwise impossible.</li>
<li>The previous two disadvantages apply even more strongly for a community&#8212;good ideas are more likely to be missed, change comes slowly, and often with steps backward.</li>
<li>At it&#8217;s most basic level, the assumption that research is zero-sum is flawed, because the process of research is not done in a closed system.  If the rest of society at large discovers that research is valuable, then the budget increases.</li>
</ol>
<p>Despite these disadvantages, there is a substantial advantage as well: you can materially protect and aid your career by rejecting papers, preventing grants, and generally discriminating against key people doing interesting but competitive work.</p>
<p>The adversarial viewpoint has a validity in proportion to the number of people subscribing to it.  For those of us who would like to deemphasize the adversarial viewpoint, what&#8217;s unclear is: how?   </p>
<p>One  concrete thing is: use <a href="http://arxiv.org/">Arxiv</a>.  For a long time, physicists have adopted an Arxiv-first philosophy, which I&#8217;ve come to respect.  Arxiv functions as a universal timestamp which decreases the power of an adversarial reviewer.  Essentially, you avoid giving away the power to muddy the track of invention.  I&#8217;m expecting to use Arxiv for essentially all my past-but-unpublished and future papers.</p>
<p>It is plausible that limiting the scope of bidding, as <a href="http://www.cs.umass.edu/~mccallum/">Andrew McCallum</a> suggested at the last ICML, and as is effectively implemented at this ICML, will help.  The system of review at journals might also help for the same reason.  In my experience as an author, if an anonymous reviewer wants to kill a paper they usually succeed. Most area chairs or program chairs are more interested in avoiding conflict with the reviewer (who they picked and may consider a friend) than reading the paper to determine the illogic of the review (which is a difficult task that simply cannot be done for all papers).  NIPS experimented with a reputation system for reviewers last year, but I&#8217;m unclear on how well it worked, as an author&#8217;s score for a review and a reviewer&#8217;s score for the paper may be deeply correlated, revealing little additional information.</p>
<p>Public discussion of research can help with this, because very poor logic simply doesn&#8217;t stand up under public scrutiny. While I hope to nudge people in this direction, it&#8217;s clear that most people aren&#8217;t yet comfortable with public discussion.</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=499</wfw:commentRss>
		</item>
		<item>
		<title>Use of Learning Theory</title>
		<link>http://hunch.net/?p=496</link>
		<comments>http://hunch.net/?p=496#comments</comments>
		<pubDate>Tue, 23 Dec 2008 17:55:45 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Machine Learning]]></category>

		<category><![CDATA[Theory]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=496</guid>
		<description><![CDATA[I&#8217;ve had serious conversations with several people who believe that the theory in machine learning is &#8220;only useful for getting papers published&#8221;.  That&#8217;s a compelling statement, as I&#8217;ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had serious conversations with several people who believe that the theory in machine learning is &#8220;only useful for getting papers published&#8221;.  That&#8217;s a compelling statement, as I&#8217;ve seen many papers where the algorithm clearly came first, and the theoretical justification for it came second, purely as a perceived means to improve the chance of publication. </p>
<p>Naturally, I disagree and believe that learning theory has much more substantial applications.  </p>
<p>Even in core learning algorithm design, I&#8217;ve found learning theory to be useful, although it&#8217;s application is more subtle than many realize.  The most straightforward applications can fail, because (as expectation suggests) worst case bounds tend to be loose in practice (*).  In my experience, considering learning theory when designing an algorithm has two important effects in practice:</p>
<ol>
<li>It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense.  The best example I have of this is the <a href="http://waldron.stanford.edu/~isomap/">Isomap</a>, where the algorithm was informed by the <a href="http://waldron.stanford.edu/~isomap/BdSLT.pdf">analysis</a> yielding substantial improvements in sample complexity over earlier algorithmic ideas.</li>
<li>An algorithm with learning theory considered in it&#8217;s design can be more automatic. I&#8217;ve gained more respect for <a href="http://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf">Rifkin&#8217;s claim</a>: that the one-against-all reduction, when tuned well, can often perform as well as other approaches.  The &#8220;when tuned well&#8221; caveat is however substantial, because learning algorithms may be applied by nonexperts or by other algorithms which are computationally constrained.  A reasonable and worthwhile hope for other methods of addressing multiclass problems is that they are more automatic and computationally faster.  The subtle issue here is: How do you measure &#8220;more automatic&#8221;?</li>
</ol>
<p>In my experience, learning theory is most useful in it&#8217;s crudest forms.  A good example comes in the architecting problem: how do you go about solving a learning problem?  I mean this in the broadest sense imaginable:</p>
<ol>
<li>Is it a learning problem or not?  Many problems are most easily solved via other means such as engineering, because that&#8217;s easier, because there is a severe data gathering problem, or because there is so much data that memorization works fine.  Learning theory such as statistical bounds and online learning with experts helps substantially here because it provides guidelines about what is possible to learn and what not.</li>
<li>What type of learning problem is it?  Is it a problem where exploration is required or not?  Is it a structured learning problem?  A multitask learning problem? A cost sensitive learning problem?  Are you interested in the median or the mean?  Is active learning useable or not?  Online or not?  Answering these questions correctly can easily make a difference between a succesful application and not.  Answering these questions is partly definition checking, and since the answer is often &#8220;all of the above&#8221;, figuring out which aspect of the problem to address first or next is helpful.
</li>
<li>What is the right learning algorithm to use?  Here the relative capacity of a learning algorithm and it&#8217;s computational efficiency are most important.  If you have few features and many examples, a nonlinear algorithm with more representational capacity is a good idea.  If you have many features and little data, linear representations or even exponentiated gradient style algorithms are important.  If you have very large amounts of data, the most scalable algorithms (so far) use a linear representation.  If you have little data and few features, a Bayesian approach may be your only option.  Learning theory can help in all of the above by quantifying &#8220;many&#8221;, &#8220;little&#8221;, &#8220;most&#8221;, and &#8220;few&#8221;.  How do you deal with the overfitting problem?  One thing I realized recently is that the overfitting problem can be a concern even with very large natural datasets, because some examples are naturally more important than others.
</li>
</ol>
<p>As might be clear, I think of learning theory as somewhat broader than might be traditional.  Some of this is simply education.  Many people have only been exposed to one piece of learning theory, often <a href="http://en.wikipedia.org/wiki/Vapnik-Chervonenkis_theory">VC theory</a> or it&#8217;s cousins.  After seeing this, they come to think of it as learning theory.  VC theory is a good theory, but it is not complete, and other elements of learning theory seem at least as important and useful.   Another aspect is publishability.  Simply sampling from the learning theory in existing papers does not necessarily give a good distribution of subjects for teaching, because the goal of impressing reviewers does not necessarily coincide with the clean simple analysis that is teachable.</p>
<p>(*) There is significant investigation into improving the tightness of bounds to the point of usefulness, and maybe it will pay off.</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=496</wfw:commentRss>
		</item>
		<item>
		<title>Summer Conferences</title>
		<link>http://hunch.net/?p=485</link>
		<comments>http://hunch.net/?p=485#comments</comments>
		<pubDate>Sat, 13 Dec 2008 00:35:36 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Conferences]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=485</guid>
		<description><![CDATA[Here&#8217;s a handy table for the summer conferences.


Conference
Deadline
Reviewer Targeting
Double Blind
Author Feedback
Location
Date


ICML (wrong ICML)
January 26
Yes
Yes
Yes
Montreal, Canada
June 14-17


COLT
February 13
No
No
Yes
Montreal
June 19-21


UAI
March 13
No
Yes
No
Montreal
June 19-21


KDD
February 2/6
No
No
No
Paris, France
June 28-July 1


Reviewer targeting is new this year.  The idea is that many poor decisions happen because the papers go to reviewers who are unqualified, and the hope is that allowing authors to [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a handy table for the summer conferences.</p>
<table border=1>
<tr>
<td>Conference</td>
<td>Deadline</td>
<td>Reviewer Targeting</td>
<td>Double Blind</td>
<td>Author Feedback</td>
<td>Location</td>
<td>Date</td>
</tr>
<tr>
<td><a href="http://www.cs.mcgill.ca/~icml2009/index.html">ICML</a> (<a href="http://www.icml2009.com/">wrong ICML</a>)</td>
<td>January 26</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Montreal, Canada</td>
<td>June 14-17</td>
</tr>
<tr>
<td><a href="http://www.cs.mcgill.ca/~colt2009/">COLT</a></td>
<td>February 13</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
<td>Montreal</td>
<td>June 19-21</td>
</tr>
<tr>
<td><a href="http://www.cs.mcgill.ca/~uai2009/">UAI</a></td>
<td>March 13</td>
<td>No</td>
<td>Yes</td>
<td>No</td>
<td>Montreal</td>
<td>June 19-21</td>
</tr>
<tr>
<td><a href="http://www.sigkdd.org/kdd2009/">KDD</a></td>
<td>February 2/6</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>Paris, France</td>
<td>June 28-July 1</td>
</tr>
</table>
<p>Reviewer targeting is new this year.  The idea is that many poor decisions happen because the papers go to reviewers who are unqualified, and the hope is that allowing authors to point out who is qualified results in better decisions.  In my experience, this is a reasonable idea to test.</p>
<p>Both UAI and COLT are experimenting this year as well with double blind and author feedback, respectively.  Of the two, I believe author feedback is more important, as I&#8217;ve seen it make a difference.  However, I still consider double blind reviewing a net win, as it&#8217;s a substantial public commitment to fairness.</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=485</wfw:commentRss>
		</item>
		<item>
		<title>A NIPS paper</title>
		<link>http://hunch.net/?p=482</link>
		<comments>http://hunch.net/?p=482#comments</comments>
		<pubDate>Mon, 08 Dec 2008 01:46:22 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Bayesian]]></category>

		<category><![CDATA[Empirical]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<category><![CDATA[Papers]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=482</guid>
		<description><![CDATA[I&#8217;m skipping NIPS this year in favor of Ada, but I wanted to point out this paper by Andriy Mnih and Geoff Hinton.  The basic claim of the paper is that by carefully but automatically constructing a binary tree over words, it&#8217;s possible to predict words well with huge computational resource savings over unstructured [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m skipping NIPS this year in favor of <a href="http://hunch.net/~ada">Ada</a>, but I wanted to point out <a href="http://www.cs.toronto.edu/~amnih/papers/hlbl_draft.pdf">this paper</a> by <a href="http://www.cs.toronto.edu/~amnih/">Andriy Mnih</a> and <a href="http://www.cs.toronto.edu/~hinton/">Geoff Hinton</a>.  The basic claim of the paper is that by carefully but automatically constructing a binary tree over words, it&#8217;s possible to predict words well with huge computational resource savings over unstructured approaches.</p>
<p>I&#8217;m interested in this beyond the application to word prediction because it is relevant to the general normalization problem: If you want to predict the probability of one of a large number of events, often you must compute a predicted score for all the events and then normalize, a computationally inefficient operation.  The problem comes up in many places using probabilistic models, but I&#8217;ve run into it with high-dimensional regression.</p>
<p>There are a couple workarounds for this computational bug:</p>
<ol>
<li>Approximate. There are many ways.  Often the approximations are uncontrolled (i.e. can be arbitrarily bad), and hence finicky in application.</li>
<li>Avoid.  You don&#8217;t really want a probability, you want the most probable choice which can be found more directly.  <a href="http://www.cs.nyu.edu/~yann/research/ebm/">Energy based model</a> update rules are an example of that approach and there are many other direct methods from supervised learning.  This is great when it applies, but sometimes a probability is actually needed.</li>
</ol>
<p>This paper points out that a third approach can be viable empirically: use a self-normalizing structure.  It seems highly likely that this is true in other applications as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=482</wfw:commentRss>
		</item>
		<item>
		<title>A Bumper Crop of Machine Learning Graduates</title>
		<link>http://hunch.net/?p=476</link>
		<comments>http://hunch.net/?p=476#comments</comments>
		<pubDate>Sat, 29 Nov 2008 01:26:52 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=476</guid>
		<description><![CDATA[My impression is that this is a particularly strong year for machine learning graduates.  Here&#8217;s my short list of the strong graduates I know.  Analpha (for perversity&#8217;s sake) by last name:

Jenn Wortmann. When Jenn visited us for the summer, she had one, two, three, four papers.  That is typical&#8212;she&#8217;s smart, capable, and [...]]]></description>
			<content:encoded><![CDATA[<p>My impression is that this is a particularly strong year for machine learning graduates.  Here&#8217;s my short list of the strong graduates I know.  Analpha (for perversity&#8217;s sake) by last name:</p>
<ol>
<li><a href="http://www.seas.upenn.edu/~wortmanj/">Jenn Wortmann</a>. When Jenn visited us for the summer, she had <a href="http://www.seas.upenn.edu/~wortmanj/papers/scavenging.pdf">one</a>, <a href="http://www.seas.upenn.edu/~wortmanj/papers/wagering.pdf">two</a>, <a href="http://www.seas.upenn.edu/~wortmanj/papers/lmsrcomplexity.pdf">three</a>, <a href="http://www.seas.upenn.edu/~wortmanj/papers/explore.pdf">four</a> papers.  That is typical&#8212;she&#8217;s smart, capable, and follows up many directions of research.  I believe approximately all of her many papers are on different subjects.</li>
<li><a href="http://www.cs.toronto.edu/~rsalakhu/">Ruslan Salakhutdinov</a>. A <a href="http://www.sciencemag.org/cgi/content/short/313/5786/504">Science paper on bijective dimensionality reduction</a>, mastered and improved on deep belief nets which seems like an important flavor of nonlinear learning, and in my experience he&#8217;s very fast, capable and creative at problem solving.</li>
<li><a href="http://www.cs.nyu.edu/~ranzato/">Marc&#8217;Aurelio Ranzato</a>.  I haven&#8217;t spoken with Marc very much, but he had a great visit at Yahoo! this summer, and has an impressive portfolio of applications and improvements on convolutional neural networks and other deep learning algorithms.</li>
<li><a href="http://www.research.rutgers.edu/~lihong/">Lihong Li</a>.  Lihong developed the <a href="http://www.research.rutgers.edu/~lihong/pub/Li08Knows.pdf">KWIK (&#8221;Knows what it Knows&#8221;) learning framework</a>, for analyzing and creating uncertainty-aware learning algorithms. New mathematical models of learning are rare, and the topic is of substantial interest, so this is pretty cool.  He&#8217;s also worked on a wide variety of other subjects and in my experience is broadly capable.</li>
<li><a href="http://www.cs.cmu.edu/~shanneke/">Steve Hanneke</a>: When the chapter on active learning is written in a machine learning textbook, I expect the <a href="http://www.cs.cmu.edu/~shanneke/docs/2007/hanneke-agnostic-active.pdf">disagreement coefficient</a> to be in it.  Steve&#8217;s work is strongly distinguished from his adviser&#8217;s, so he is guaranteed capable of independent research.</li>
</ol>
<p>There are a couple others such as <a href="http://www.cs.ucsd.edu/~djhsu/">Daniel</a> and <a href="http://www.cs.berkeley.edu/~jake/">Jake</a> for whom I&#8217;m unsure of their graduation plans, although they have already done good work.  In addition, I&#8217;m sure there are several others that I don&#8217;t know&#8212;feel free to mention others I don&#8217;t know in comments.</p>
<p>It&#8217;s traditional to imagine that one is best overall for hiring purposes, but I have substantial difficulty with that&#8212;the field of ML is simply to broad.  Instead, if you are interested in hiring, each should be considered in your context.</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=476</wfw:commentRss>
		</item>
		<item>
		<title>Efficient Reinforcement Learning in MDPs</title>
		<link>http://hunch.net/?p=472</link>
		<comments>http://hunch.net/?p=472#comments</comments>
		<pubDate>Wed, 26 Nov 2008 13:29:47 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Reinforcement]]></category>

		<category><![CDATA[Theory]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=472</guid>
		<description><![CDATA[Claude Sammut is attempting to put together an Encyclopedia of Machine Learning.  I volunteered to write one article on Efficient RL in MDPs, which I would like to invite comment on.  Is something critical missing?
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cse.unsw.edu.au/~claude/">Claude Sammut</a> is attempting to put together an <a href="http://www.amazon.co.uk/Encyclopedia-of-Machine-Learning/dp/0387307680">Encyclopedia of Machine Learning</a>.  I volunteered to write one article on <a href="images/Efficient_Reinforcement_Learning.pdf">Efficient RL in MDPs</a>, which I would like to invite comment on.  Is something critical missing?</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=472</wfw:commentRss>
		</item>
		<item>
		<title>Observations on Linearity for Reductions to Regression</title>
		<link>http://hunch.net/?p=468</link>
		<comments>http://hunch.net/?p=468#comments</comments>
		<pubDate>Mon, 17 Nov 2008 00:54:59 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Machine Learning]]></category>

		<category><![CDATA[Reductions]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=468</guid>
		<description><![CDATA[Dean Foster and Daniel Hsu had a couple observations about reductions to regression that I wanted to share.  This will make the most sense for people familiar with error correcting output codes (see the tutorial, page 11).
Many people are comfortable using linear regression in a one-against-all style, where you try to predict the probability [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://gosset.wharton.upenn.edu/~foster/index.pl">Dean Foster</a> and <a href="http://www.cse.ucsd.edu/~djhsu/">Daniel Hsu</a> had a couple observations about reductions to regression that I wanted to share.  This will make the most sense for people familiar with error correcting output codes (see the <a href="http://hunch.net/~jl/projects/reductions/tutorial/paper/chapter.pdf">tutorial, page 11</a>).</p>
<p>Many people are comfortable using linear regression in a one-against-all style, where you try to predict the probability of choice <i>i</i> vs other classes, yet they are not comfortable with more complex error correcting codes because they fear that they create harder problems.  This fear turns out to be mathematically incoherent under a linear representation: comfort in the linear case should imply comfort with more complex codes.</p>
<p>In particular, If there exists a set of weight vectors <i>w<sub>i</sub></i> such that <i>P(i|x)= &lt;w<sub>i</sub>,x&gt;</i>, then for any invertible error correcting output code <i>C</i>, there exists weight vectors <i>w<sub>c</sub></i> which decode to perfectly predict the probability of each class.  The proof is simple and constructive: the weight vector <i>w<sub>c</sub></i> can be constructed according to the linear superposition of <i>w<sub>i</sub></i> implied by the code, and invertibility implies that a correct encoding implies a correct decoding.</p>
<p>This observation extends to all-pairs like codes which compare subsets of choices to subsets of choices using &#8220;don&#8217;t cares&#8221;.</p>
<p>Using this observation, Daniel created a very short proof of the PECOC regret transform theorem (<a href="http://hunch.net/images/pecoc.pdf">here</a>, and Daniel&#8217;s <a href="http://www.cse.ucsd.edu/~djhsu/notes/pecoc.pdf">updated version</a>).</p>
<p>One further observation is that under ridge regression (a special case of linear regression), for any code, there exists a setting of parameters such that you might as well use one-against-all instead, because you get the same answer numerically.  The implication is that the advantages of codes more complex than one-against-all is confined to other prediction methods.</p>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=468</wfw:commentRss>
		</item>
		<item>
		<title>COLT CFP</title>
		<link>http://hunch.net/?p=465</link>
		<comments>http://hunch.net/?p=465#comments</comments>
		<pubDate>Tue, 11 Nov 2008 23:13:05 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Announcements]]></category>

		<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=465</guid>
		<description><![CDATA[Adam Klivans, points out the COLT call for papers.  The important points are: 

Due Feb 13.
Montreal, June 18-21.
This year, there is author feedback.

]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cs.utexas.edu/~klivans/">Adam Klivans</a>, points out the <a href="http://www.learningtheory.org/index.php?option=com_content&#038;view=article&#038;id=12:colt-2009-call-for-papers&#038;catid=20:general&#038;Itemid=8">COLT call for papers</a>.  The important points are: </p>
<ol>
<li>Due Feb 13.</li>
<li>Montreal, June 18-21.</li>
<li>This year, there is author feedback.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=465</wfw:commentRss>
		</item>
		<item>
		<title>ICML Reviewing Criteria</title>
		<link>http://hunch.net/?p=461</link>
		<comments>http://hunch.net/?p=461#comments</comments>
		<pubDate>Tue, 11 Nov 2008 01:13:23 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=461</guid>
		<description><![CDATA[Michael Littman and Leon Bottou have decided to use a franchise program chair approach to reviewing at ICML this year.  I&#8217;ll be one of the area chairs, so I wanted to mention a few things if you are thinking about naming me.

I take reviewing seriously.  That means papers to be reviewed are read, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cs.rutgers.edu/~mlittman/">Michael Littman</a> and <a href="http://leon.bottou.org/">Leon Bottou</a> have decided to use a franchise program chair approach to <a href="http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=3826&#038;copyownerid=1513">reviewing at ICML</a> this year.  I&#8217;ll be one of the area chairs, so I wanted to mention a few things if you are thinking about naming me.</p>
<ol>
<li>I take reviewing seriously.  That means papers to be reviewed are read, the implications are considered, and decisions are only made after that.  I do my best to be fair, and there are zero subjects that I consider categorical rejects.  I don&#8217;t consider several <a href="http://hunch.net/?p=441">arguments for rejection-not-on-the-merits reasonable</a>.</li>
<li>I am generally interested in papers that (a) analyze new models of machine learning, (b) provide new algorithms, and (c) show that they work empirically on plausibly real problems.  If a paper has the trifecta, I&#8217;m particularly interested. With 2 out of 3, I might be interested.  I often find papers with only one element harder to accept, including papers with just (a). </li>
<li>I&#8217;m a bit tough.  I rarely jump-up-and-down about a paper, because I believe that great progress is rarely made.  I&#8217;m not very interested in new algorithms with the same theorems as older algorithms.  I&#8217;m also cautious about new analysis for older algorithms, since I like to see analysis driving algorithm rather than vice-versa.  I prioritize a proof-of-possibility over a quantitative improvement.  I consider quantitative improvements of small constant factors in sample complexity significant.  For computationaly complexity, I generally want to see at least an order of magnitude improvement.  I generally disregard any experiments on toy data, because I&#8217;ve found that toy data and real data can too-easily differ in their behavior.</li>
<li>My personal interests are pretty well covered by <a href="http://hunch.net/~jl/">existing papers</a>, but this is perhaps not too important a criteria, compared to the above, as I easily believe other subjects are interesting.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=461</wfw:commentRss>
		</item>
		<item>
		<title>A Healthy  COLT</title>
		<link>http://hunch.net/?p=457</link>
		<comments>http://hunch.net/?p=457#comments</comments>
		<pubDate>Sun, 09 Nov 2008 16:49:29 +0000</pubDate>
		<dc:creator>jl</dc:creator>
		
		<category><![CDATA[Conferences]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://hunch.net/?p=457</guid>
		<description><![CDATA[A while ago, we discussed the health of COLT.  COLT 2008 substantially addressed my concerns.  The papers were diverse and several were interesting.  Attendance was up, which is particularly notable in Europe.  In my opinion, the colocation with UAI and ICML was the best colocation since 1998.
And, perhaps best of all, [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://hunch.net/?p=95">while ago</a>, we discussed the health of <a href="http://learningtheory.org/">COLT</a>.  <a href="http://colt2008.cs.helsinki.fi/">COLT 2008</a> substantially addressed my concerns.  The papers were diverse and several were interesting.  Attendance was up, which is particularly notable in Europe.  In my opinion, the colocation with UAI and ICML was the best colocation since 1998.</p>
<p>And, perhaps best of all, registration ended up being free for all students due to various grants from the <a href="http://www.aka.fi/en-gb/A/">Academy of Finland</a>, <a href="http://google.com">Google</a>, <a href="http://ibm.com">IBM</a>, and <a href="http://yahoo.com">Yahoo</a>.</p>
<p>A basic question is: what went right?  There seem to be several answers.</p>
<ol>
<li>Cost-wise, COLT had sufficient grants to alleviate the high cost of the Euro and location at a university substantially reduces the cost compared to a hotel.</li>
<li>Organization-wise, the Finns were great with hordes of volunteers helping set everything up.  Having too many volunteers is a good failure mode.</li>
<li>Organization-wise, it was clear that all 3 program chairs were cooperating in designing the program.</li>
<li>Facilities-wise, proximity in time and space made the colocation much more real than many others have been in the past.</li>
<li>Program-wise, COLT notably had two younger program chairs, <a href="http://stat.rutgers.edu/~tzhang/">Tong</a> and <a href="http://www.cs.columbia.edu/~rocco/">Rocco</a>, which seemed to work well.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://hunch.net/?feed=rss2&amp;p=457</wfw:commentRss>
		</item>
	</channel>
</rss>
