Why Enrichment Designs Don’t Work in Clinical Trials

Last week I was discussing a clinical trial design with colleagues. This particular trial used an enrichment design. A few years ago I did some simulation work to show that you can’t pick patients to enroll in a clinical trial in order to improve the results.

People are probabilistic too.

The idea of and enrichment design is to winnow the overall patient group down to those individuals who are likely to respond to therapy. One way is to give all of the candidates a placebo and eliminate placebo responders. Another strategy is to give a test dose of drug and keep only those who respond. Either way, the patients that pass the screening test get to go on to a double blind test of active drug versus placebo.

Sounds like a great idea, but it doesn’t really work most of the time in practice. While this idea of screening out patients, it turns out that it mostly just excludes patients who are varying in their complaints over time. You can’t really  tell who are going to be better patients during the screening test. It turns out that most patients look different at one time point compared to any other.

The mistake that we make is in thinking that people can be categorized by simple inspection. We think of patients as responders or non-responders, an intrinsic characteristic they have or don’t have. Trying to screen out patients we don’t want falls into the trap of thinking that a single set of tests can successfully discriminate between classes.

The way I think of it is that we need relatively large clinical trials to prove the value of a modestly effective drug. So it seems odd to think that one could easily categorize patients themselves when tested. You can see this by looking at how well a test dose of a drug looking for drug responders would be able to enrich a patient population. Variability over time makes this impossible.

Let’s walk through an example. An imaginary trial of a drug to treat migraine attacks.

Lets say we know the truth and this candidate is in reality a pretty good treatment for a migraine attack. But the patient varies in headache severity and responsiveness to treatment.

Some headaches are mild and will resolve without treatment. That mild attack will act no differently whether the active drug or placebo was administered. Some headaches are very bad and even a really effective drug might not touch that kind of headache. So again the attack will be the same whether placebo or treatment is given.

And what about the headaches that are in between and could respond? Well if a drug worked half the time, then out of every two of those attacks, the active drug would show an effect where the placebo did not. The other half the time, it would look just like placebo again.

Add up these cases, there are four of them. For only one atttack did the active drug work where the placebo would fail. One out of 4 times, a 25% overall response rate. All just because in the same patient the headache and its response to drug changes. So if I did a test treatment to see if I had a responder, I would eliminate half of the responders because either they had a headache that was the the one too severe to respond or the one that happened not to respond that time.

Of course you’d eliminate some of the non-repsonders. But we know that even non-responders may have 1 in 4 headaches that are mild enough that they don’t need the treatment anyway. So you eliminate 75% of the non-responders with a test dose which is better than the 50% of responders that were eliminated. You’ve done better. How much better depends on the ratio of responders to non-responders in the population, a ratio that is completely unknown.

What’s nice is that while you can see the logic by reading the story I’ve told, a mental simulation, one can create an explicit mathematical model of the clinical trial and simulate running the trial hundreds of times. It turns out that there very few conditions where this kind of enrichment really works. I turns out its simpler and just as informative to see whether or not the drug is effective in the overall population without trying to prejudge who is a responder or not with a test dose.

The irony? This is exactly the opposite of clinical practice. In the real clinic, each patient is their own individual clinical trial, an “N of 1” as we say. N is the symbol for the number in a population. An individual is a population of one. N of 1. We treat the patient and over time judge whether or not they respond in a personal clinical trial. Not to see whether the drug works but whether the patient is a responder.  If they don’t, therapy is adjusted or changed. But in our migraine example, multiple headaches of various intensity would have to be treated so see the benefit.

Perhaps variability across a population is easily grasped. People are tall or short, have dark or light hair color. Variability within an individual over time is perhaps more subtle but just as important for over time.

1 Comment

  1. Your point is well taken. Its so easy to get an inaccurate prediction from trying to simplify a complex system. As in your example, there are just too many variables, known and unknown. Individual patient variability over time is a big one. Headache severity is one aspect of this. But what if the patient has just taken too much ibupfrofen and today is suffering more from medication rebound than migraine? Or perhaps their headache is compounded by a concurrent viral illness. It would be easy to just eliminate those not responding to a test dose, but I can see how this would be inaccurate.

    I hope we see big benefits in drug development from new biomarkers. Perhaps biomarkers can help us know whether the patient is responding by measuring blood flow to the brain, etc. Then patients who do not show the expected physiologic result can be deemed nonresponders, with reasonable certainty. Is personal medicine still on the horizon?

Leave a Comment

Your email address will not be published. Required fields are marked *