This post is about a technology which could develop in the future.
Right now, a new drug might be tested by finding patients with some diagnosis and giving or not giving them a drug according to a secret randomization. The outcome is observed, and if the average outcome for those treated is measurably better than the average outcome for those not treated, the drug might become a standard treatment.
Generalizing this, a filter F sorts people into two groups: those for treatment A and those not for treatment B based upon observations x. To measure the outcome, you randomize between treatment and nontreatment of group A and measure the relative performance of the treatment.
A problem often arises: in many cases the treated group does not do better than the nontreated group. A basic question is: does this mean the treatment is bad? With respect to the filter F it may mean that, but with respect to another filter F’, the treatment might be very effective. For example, a drug might work great for people which have one blood type, but not so well for others.
Finding F’ is a situation where machine learning can help. The setting is essentially isomorphic to this one. The basic import is that we can learn a rule F’ for filters which are more strict than the original F. This can be done on past recorded data, and if done properly we can even statistically prove that F’ works, without another randomized trial. All of the technology exists to do this now—the rest is a matter of education, inertia, and desire.
Here’s what this future might look like:
- Doctors lose a bit of control. Right now, the filters F are typically a diagnosis of one sort or another. If machine learning is applied, the resulting learned F’ may not be easily described as a particular well-known diagnosis. Instead, a doctor might record many observations, and have many learned filters F’ applied to suggest treatments.
- The “not understanding the details” problem is sometimes severe, so we can expect a renewed push for understandable machine learning rules. Some tradeoff between understandability and predictive power seems to exist creating a tension: do you want a good treatment or do you want an understandable treatment?
- The more information fed into a learning algorithm, the greater it’s performance can be. If we manage to reach a pointer in the future where Gattaca style near instantaneous genomic sequencing is available, feeding this into a learning algorithm is potentially very effective. In general a constant pressure to measure more should be expected. Given that we can learn from past data, going back and measuring additional characteristics of past patients may even be desirable.
- Since many treatments are commercial in the US, there will be a great deal of pressure to find a filter F’ which appears good, and a company investing millions into the question is quite capable of overfitting so that F’ is better than it appears. Safe and sane ways to deal with this exist, as showcased by various machine learning challenges, such as the Netflix challenge. To gain trust in such approaches, a trustable and trusted third party capable of this sort of testing must exist. Or, more likely, it won’t exist, and so we’ll need a new trial to test any new F’.
This is already happening! See e.g. zelnorm, and there are a couple of other drugs to which this idea has been applied. Note, however, that the FDA really doesn’t like arbitrary, complex rules.