From luca@eecs.berkeley.edu Tue Feb 22 01:31:26 2005
Return-Path: <luca@eecs.berkeley.edu>
Received: from localhost (localhost.localdomain [127.0.0.1]) by
	wall.hunch.net (8.13.1/8.13.1) with ESMTP id j1M7VQoH026775 for
	<jl@localhost>; Tue, 22 Feb 2005 01:31:26 -0600
Received: from orange.he.net [66.220.13.66] by localhost with POP3
	(fetchmail-6.2.5) for jl@localhost (single-drop); Tue, 22 Feb 2005 01:31:26
	-0600 (CST)
Received: from relay0.EECS.Berkeley.EDU ([169.229.60.163]) by tti-c.org for
	<jl@tti-c.org>; Mon, 21 Feb 2005 23:33:29 -0800
Received: from gateway2.EECS (gateway2.EECS.Berkeley.EDU [169.229.60.39])
	by relay0.EECS.Berkeley.EDU (8.13.3/8.12.10) with ESMTP id j1M7XR5G010484;
	Mon, 21 Feb 2005 23:33:27 -0800 (PST)
Received: from [172.16.1.33] (adsl-68-120-138-22.dsl.snfc21.pacbell.net
	[68.120.138.22]) by gateway2.EECS.Berkeley.EDU (iPlanet Messaging Server
	5.2 Patch 2 (built Jul 14 2004)) with ESMTPSA id
	<0ICA00LA0YBQUD@gateway2.EECS.Berkeley.EDU>; Mon, 21 Feb 2005 23:33:27
	-0800 (PST)
Date: Mon, 21 Feb 2005 23:36:38 -0800
From: Luca Trevisan <luca@eecs.berkeley.edu>
Subject: CCC'05 Comments - paper 19
To: beygel@cs.rochester.edu, jl@tti-c.org, zadrozny@us.ibm.com, varsha@cs.uchicago.edu, hayest@cs.uchicago.edu
Message-id: <421AE106.5090701@eecs.berkeley.edu>
MIME-version: 1.0
Content-type: text/plain; charset=ISO-8859-1; format=flowed
X-Accept-Language: en-us, en
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Spam-Status: No, hits=0.1 required=5.0
	tests=SPAM_PHRASE_02_03,USER_AGENT,X_ACCEPT_LANG version=2.43
X-Spam-Level: 
X-Evolution-Source: mbox:/var/spool/mail/jl
Content-Transfer-Encoding: 8bit

Dear author:

Here are comments from the Program Committee and/or its
subreferees on your CCC'05 submission:


19 	Error Limiting Reductions Between Classification Tasks

Best Regards
Luca Trevisan

-------------------------------------------------------

Each reviewer had some problems with the exposition

Reviewer #1

I think the paper is rather poorly written.

Here are some examples of things badly explained/odd looking to me.

In Definition 1, K "is the information given at training time" (?)
In Definition 2, the feature space X is "arbitrary" (?)
What is "I" in the table on top of page 4?
I do however, think that I understand Definition 1 + Definition 2,
though I don't see why it is split in two (i.e, why they have
"tasks" as well as "problems")

Theorem 1 is incomprehensible at this point, using terminology
not yet explained.

In 3.2 the notion of "subproblem" appear which has not been defined.
Subproblem of what?

The sentence where I stop understanding anything is "If the reduction
makes several sequential calls, we need to define D' for each
invocation. Suppose that first invocation produces h'. We replace
this invocation with the oracle that always returns h', and use
the above definition to find D' for the next invocation".".
But how should/can h' influcence D'? I don't get it. This discussion
is used in the crucial Definition 4 which I hence couldn't understand. 
But on a
syntactical level some further confusing things in this definition are:
* The reductions are from "tasks" to "tasks", but still mention X which
is part of the definition of a "problem", not a "task".
* What is h_D and h'_D'. What is max(h'_D')?

Reviewer #2


This paper defines a certain notion of reduction between
classification tasks. This notion models several known
reductions. The authors also present a new reduction to
binary classification that seems superior to existing
reductions.

I found it very difficult to follow the definitions in Section 3,
especially Section 3.2.
Perhaps what is missing is some kind of example to make all these
definitions concrete. The first example, Section 4.1, is also not so
helpful since you do not indicate how it is related to the
definitions.

Some minor comments:
-- Section 3, the sentence "A task is (implicitly)..." is confusing
-- Section 4.1, shouldn't the title be "... to Binary classification"?
-- Section 6, "discuss dicuss"
-- How does you 'weighted all-pairs' reduction compare with [9]?
-- Where do you define the error rate of a classifier?
(and where is a classifier defined?)
Also, you seem to require the classifier to work for any input in (X 
\times K)^*.
Don't you need the quality of the hypothesis to depend on
the input? where does that enter in your arguments/definitions?
For example, if the input to the learning algorithm is empty,
is it still supposed to output a good hypothesis?
-- Definition 4, the reduction might yields completely
different distributions when the input S is a one-element
sequence and when it is longer than one.
Doesn't this cause any problem with Def 4?
-- Def 4, bottom, what do you mean by "For all X \times K ?"

Reviewer #3


The paper gives a formalization of "learning task", and of
"reduction" between learning tasks. Various known results can be cast in 
this setting. One new result appears in section 5, a new reduction
giving an algorithm that appears to perform better than known ones.

Overall, I thought that the definitional part of the paper was too 
general, to the point
of being vacuous: a "learning task", for example, is a set of possible 
inputs, a set of
possible outputs, and a scoring function. This is equivalent, e.g., to 
the standard definition of
optimization problem, and does not seem to capture the "essence" of what 
a learning
task is. (One can fit concrete examples into it, of course, but then one 
could have
defined a learning task to be a set, and surely it would have been 
possible to
represent concrete problems as sets.) The definition of reduction (Def 
4) is rambling,
and barely comprehensible. The original technical part of the paper 
(Section 5, mostly)
is, in contrast, too concrete, to the point of including experimental 
results.
