One of the most confusing things about understanding learning theory is the vast array of differing assumptions. Some critical thought about which of these assumptions are reasonable for real-world problems may be useful.
Before we even start thinking about assumptions, it’s important to realize that the word has multiple meanings. The meaning used here is “assumption = axiom” (i.e. something you can not verify).
Assumption | Reasonable? | Which analysis? | Example/notes |
Independent and Identically Distributed Data | Sometimes | PAC,ERM,Prediction bounds,statistics | The KDD cup 2004 physics dataset is plausibly IID data. There are a number of situations which are “almost IID” in the sense that IID analysis results in correct intuitions. Unreasonable in adversarial situations (stock market, war, etc…) |
Independently Distributed Data | More than IID, but still only sometimes | online->batch conversion | Losing “identical” can be helpful in situations where you have a cyclic process generating data. |
Finite exchangeability (FEX) | Sometimes reasonable | as for IID | There are a good number of situations where there is a population we wish to classify, pay someone to classify a random subset, and then try to learn. |
Input space uniform on a sphere | No. | PAC, active learning | I’ve never observed this in practice. |
Functional form: “or” of variables, decision list, “and” of variables | Sometimes reasonable | PAC analysis | There are often at least OK functions of this form that make good predictions |
No Noise | Rarely reasonable | PAC, ERM | Most learning problems appear to be of the form where the correct prediction given the inputs is fundamentally ambiguous. |
Functional form: Monotonic on variables | Often | PAC-style | Many natural problems seem to have behavior monotonic in their input variables. |
Functional form: xor | Occasionally | PAC | I was suprised to observe this. |
Fast Mixing | Sometimes | RL | Interactive processes often fail to mix, ever, because entropy always increases. |
Known optimal state distribution | Sometimes | RL | Sometimes humans know what is going on, and sometimes not. |
Small approximation error everywhere | Rarely | RL | Approximate policy iteration is known for sometimes behaving oddly. |
If anyone particularly agrees or disagrees with the reasonableness of these assumptions, I’m quite interested.
John,
I’m not sure why you believe these assumptions are unverifiable. In particular we can conceivably come up with good tests for at least empirically verifying several of the assumptions you list. If you do not have an opportunity to see if an assumption is valid, how do you assess if it is “reasonable”?
In this sense we can at least empirically see if the assumption of identical distribution of samples is valid, or if the samples are uniformly distributed on a sphere (possibly after normalization to unit length). Why should we not do so, in order to evaluate if an assumption is reasonable on our data?
I understand that we have to be careful not to “double count” this verification while designing learning systems: in particular this may introduce bias in the inference process if we first verify the intuition and then use an algorithm based on this inference (without accounting for having seen the data already while choosing the inference method), but this should be something to be attempted rigorously rather than a problem that makes use give up the option to verify assumptions.
You can judge distributional assumptions if you impose model assumptions, and vice versa.
There are cases when you have reason to think that you can judge assumptions based on other assumptions. First assume that your Universe is ruled by a classification tree. If you observe the learning curve and if you see that the the model does not change any more by adding new data, you know that you have enough data not to have to bother with the intricacies of FEX. If, however, the trees you obtain with 90% of the data and with 100% of the data are quite different, then you really should use FEX.
Balaji, I’d propose a test. Specify any particular assumption above and any test which verifies it. I’ll produce a datagenerator which does not satisfy the assumption yet the test passes with high probability.
(Depending on the test, some of these statements are more worrisome than others.)
Assumptions have little to do with objective predictions and everything to do with what we to cause to happen. According to quantum mechanics we are creating reality moment by moment precisely by what we “expect” to see. In that sense all assumptions are causes of reality which should make any prediction that fits our assumption fairly accurate as long as we do not doubt it.
“According to quantum mechanics we are creating reality moment by moment precisely by what we “expect†to see.” Is this true?
Are you saying that by thought and expectation alone we are creating an outcome?
If I collect datas for running 20 metres 20 times . My datas are in seconds .I assume my datas are independent (iid) and aprroximately normal.How do I know if my assumption are reasonable for my datas? Do I have any ways to check it out ?? Coz my assignment ask me to includ details of any assumption checks I carry out .