Why Enrichment Designs Don’t Work in Clinical Trials

Last week I was discussing a clinical trial design with colleagues. This particular trial used an enrichment design. A few years ago I did some simulation work to show that you can’t pick patients to enroll in a clinical trial in order to improve the results.

People are probabilistic too.

The idea of and enrichment design is to winnow the overall patient group down to those individuals who are likely to respond to therapy. One way is to give all of the candidates a placebo and eliminate placebo responders. Another strategy is to give a test dose of drug and keep only those who respond. Either way, the patients that pass the screening test get to go on to a double blind test of active drug versus placebo.

Sounds like a great idea, but it doesn’t really work most of the time in practice. While this idea of screening out patients, it turns out that it mostly just excludes patients who are varying in their complaints over time. You can’t really  tell who are going to be better patients during the screening test. It turns out that most patients look different at one time point compared to any other.

The mistake that we make is in thinking that people can be categorized by simple inspection. We think of patients as responders or non-responders, an intrinsic characteristic they have or don’t have. Trying to screen out patients we don’t want falls into the trap of thinking that a single set of tests can successfully discriminate between classes.

The way I think of it is that we need relatively large clinical trials to prove the value of a modestly effective drug. So it seems odd to think that one could easily categorize patients themselves when tested. You can see this by looking at how well a test dose of a drug looking for drug responders would be able to enrich a patient population. Variability over time makes this impossible.

Let’s walk through an example. An imaginary trial of a drug to treat migraine attacks.

Lets say we know the truth and this candidate is in reality a pretty good treatment for a migraine attack. But the patient varies in headache severity and responsiveness to treatment.

Some headaches are mild and will resolve without treatment. That mild attack will act no differently whether the active drug or placebo was administered. Some headaches are very bad and even a really effective drug might not touch that kind of headache. So again the attack will be the same whether placebo or treatment is given.

And what about the headaches that are in between and could respond? Well if a drug worked half the time, then out of every two of those attacks, the active drug would show an effect where the placebo did not. The other half the time, it would look just like placebo again.

Add up these cases, there are four of them. For only one atttack did the active drug work where the placebo would fail. One out of 4 times, a 25% overall response rate. All just because in the same patient the headache and its response to drug changes. So if I did a test treatment to see if I had a responder, I would eliminate half of the responders because either they had a headache that was the the one too severe to respond or the one that happened not to respond that time.

Of course you’d eliminate some of the non-repsonders. But we know that even non-responders may have 1 in 4 headaches that are mild enough that they don’t need the treatment anyway. So you eliminate 75% of the non-responders with a test dose which is better than the 50% of responders that were eliminated. You’ve done better. How much better depends on the ratio of responders to non-responders in the population, a ratio that is completely unknown.

What’s nice is that while you can see the logic by reading the story I’ve told, a mental simulation, one can create an explicit mathematical model of the clinical trial and simulate running the trial hundreds of times. It turns out that there very few conditions where this kind of enrichment really works. I turns out its simpler and just as informative to see whether or not the drug is effective in the overall population without trying to prejudge who is a responder or not with a test dose.

The irony? This is exactly the opposite of clinical practice. In the real clinic, each patient is their own individual clinical trial, an “N of 1” as we say. N is the symbol for the number in a population. An individual is a population of one. N of 1. We treat the patient and over time judge whether or not they respond in a personal clinical trial. Not to see whether the drug works but whether the patient is a responder.  If they don’t, therapy is adjusted or changed. But in our migraine example, multiple headaches of various intensity would have to be treated so see the benefit.

Perhaps variability across a population is easily grasped. People are tall or short, have dark or light hair color. Variability within an individual over time is perhaps more subtle but just as important for over time.

Topaz InFocus



In the Mud, originally uploaded by jjvornov.

InFocus is a Photoshop PlugIn that uses deconvolution to sharpen images by refocusing. Different from edge methods like unsharp masking. Back in my microscopy days, these methods were just starting to come into use, often with the use of multiple focus planes for virtual confocal microscopy.

This is not the best example as a photo, since the water is causing blur in this photo, but a good test of how this PlugIn works to recover detail.

The Mind of a Cat

The other day, provoked by reading Iain Bank’s latest SF novel: Surface Detail, thought that perhaps one of the practical applications of philosphical mediaiton on the nature of mind was the nagging question of whether a machine could ever be conscious or self aware like  me and you.

On Twitter, Marc Bernstein of Eastgate and Tinderbox fame asked the obvious question of how one would ever know a machine was self aware. A very good question because the nature of subjective experience is that it is accessable only to one mind, the one expereiencing it.

Now when it comes to other people, I can never experience what its like to be  them subjectively. Yet I make a very strong assumption that they are experiencing a mind pretty much exactly the way that I do.

The reason I make the assumption that other people share subjective awareness is by analogy. While I can read descriptions written by others or directly query my family and friends about their subjective experience, why should I trust them? I trust them because the look and act just like I do. So its a pragmatic assumption that they experience the same cognitive function as me.

A machine could be self aware and try to convince me, but it will be a very hard sell because of the lack of analogous processes. It may or may not be a true claim by the machine intelligence. I just don’t know what it would take to convince me.

Looking in an entirely different direction provides further insight into the power of analogy. We look to animals as models of our own cognition. In my own current field of drug development, we use a large toolbox of animal cognition models to test new drugs. We test drugs on animal behaviors that reflect target internal human states. For example, drugs to improve memory in patients with Alzheimer’s Disease are examined in rats swimming in water mazes where they have to remember the right way to go. We can’t read a rat a story and ask recall questions, so behavioral tests are substituted.

While we know that these animal models of human cognition have a variable track record in predicting drug effects in human disease, the philosophical point is that we rely on animals because of analogy to the human brain. Similarly, I think that by analogy, we assume that animals, mammals at least, see, hear, taste, smell, and touch much as we do.

My cats may be without computers, words and music, but I believe they are conscious, experiencing minds. When we look at each other, there’s some one home on both sides. My technological props put me way ahead as a successful organism.

When Common Sense Fails

I’m afraid of people who’s position is simply that we need some common sense in Washington. Good old common sense conservatism is likely to lead to worse or at least different problems than we currently face.

I’m a great fan of common sense and decisions “made from the gut”. When I was using the formal techniques of Decision Analysis or working with very talented modeling and simulation experts, everyone always realized that there was a gut check that had to be made before accepting the output of a model.

After all, it was not all that uncommon that an error crept into the modeling at some stage leading to the completely wrong conclusion.  Call it what you will, reality testing or sense checking, no one would follow the analytic techniques blindly. Kind of like letting the GPS unit tell you to drive off the road into a lake or the forest.

More subtly though, one realizes how much bias creeps into these rational analytic decision tools. After all, if we didn’t like the outcome of a simulation there were parameters to fiddle with that might produce “more sensible results”. More troubling was the realization that mistaken but favorable outcomes were not going to questioned. In fact if an error was detected, the error would be defended vigorously trying to preserve a mistaken but desirable belief about the world and the outcome of particular decisions.

As I left the world of analytic decision tools and focused more on mental models I realized of course that our own metaphors for the world had these biases, but often completely hidden to us. In a physiological modeling and simulation analysis at least the underlying data can be examined and all of the model assumptions are explicit. If you understand the methods well enough the biases can be identified and perhaps addressed.

The beliefs we hold about the world aren’t so accessible to us. For example, other people are experienced as mental models of other brains. By analogy with our own thoughts and language use we believe we can understand what some one else is telling us. After all their language is run through the language systems in our brains, transferring thought from one brain to another through the medium of speech phonemes. The sounds themselves are meaningless. Its the process of transfer that is meaning.

Optimism, hopefulness are biases. Prejudice and expectations are biases. They color perception and influence decision making.

Clearly if we have an incorrect model of some one else we can make poor decisions. If my model of that car salesman is that he’s my buddy with my best interests at heart I will probably suffer a financial loss compared to a model that sees him purely as the intermediary with the larger organization that is the auto dealership.

So lets be careful about elevating “common sense” to a status of the ultimate truth. There’s a populism in the US today that wants to ignore the complexities of economics and large inter-dependent systems (banks, global trade, heath care, public assistance) and simply rely on common sense.

I’m convinced that simplifying assumptions are always necessary in models. In fact models that can’t be understood intuitively because of complexity or emergence are not as useful as models that can be internalized as intuition. That’s a big part of what real expertise is all about.

But simplifying must be pragmatic, that is proven to work in the real world across some set of conditions. Simplification that is ideologically driven because some principle or other “must be true” is ideology not pragmatism and is likely to fail. And failure is commonly through unintended consequences.

Unique



Pool, originally uploaded by jjvornov.

One of the most impressive aspects of Yellowstone is just how unusual a spot on earth it is. Its one of just a handful of sites on the earth where heat, water and geology conspire to create geysers and pools.

The Astounding Quality of the iPhone 4 Camera

I knew from the first days with my iPhone 4 that I wasn’t going to need a small camera for snaps because of its quality.

Imagine my surprise when I discovered that the Yellowstone image I posted yesterday showed up on Flickr as geotagged. Why? It was an iPhone image.

I ran some noise reduction on the image before posting because of pattern noise in the trees at the upper third of the image but I thought that was from aggressive post processing of shadows. Unusual for the Nikon D300.

Just astounding really.

Digging Deeper Holes

Making decisions always limits future options. Choosing one of two forks in the road precludes taking the other fork without added costs of backtracking and starting over. Moving into the future, the decision space is always changing. In some ways it collapses because choices not made disappear and become unavailable. But at the same time, the decision space expands and the chosen path is traveled.

I love thinking about making decisions at the start. Clean sheet of paper and infinite possibilities. Yet that is an entirely artificial metaphor. We always find ourselves in the middle of the story. And here there are many constraints that are the consequences made previously, often by others. Whenever I hear discussions about the US Federal Budget deficits, I think about these constraints. Large systems have been created over the years (Social Security, Medicare and Medicaid) to prevent the widespread poverty and lack of medical care that were once commonplace among the elderly. Having created these systems, it becomes unthinkable (impossible?) to end them even as they require larger and larger resources every year. Having been created with no built in limits or budgets, these entitlements grow and grow, limited only by the ingenuity of those in my industries, medical care and drug development.

The decisions made early on, when these programs were smaller, have led to unintended consequences which could be catastrophic in a few years or decades. But now it seems that changing paths to avoid these outcomes may not be among the choices that can be made by the government.

I wonder whether there is an inevitability to certain outcomes once choices are made and systems created. Are these some kind of local minimum from which escape is impossible? Must it be the catastrophe that opens up new decision space? I use the metaphor of digging yourself into a hole. The hole gets so deep that one can no longer climb out, so that the more you dig, the deeper and more inescapable the hole becomes.

I can’t quite explain why we feel compelled to keep on digging when its clear that the path does not lead out but only deeper.

Making it cloudy



Vertical, originally uploaded by jjvornov.

Usually I’m injecting light into images. Here I was working to take out way too much mid day glare against white mineral flows. It ended up looking almost overcast as I brought out the detail and color.

One of my rare verticals.