We only experience a single, stable perception at a time. How bistable viusal figures and Karl Friston’s big idea explain how we build a coherent prediction of ourselves in the environment and keep our selves sane in an uncertain world.
By James Vornov, MD PhD
Neurologist, drug developer and philosopher exploring the neuroscience of decision-making and personal identity.
It’s not just an accident of biology that our brains work as single-threaded engines, engaging only one input stream at a time, one view of the world at a time. I’m going to argue here that a real-time control system operating under conditions of uncertainty needs to minimize error and settle on its best guess prediction of the state of the world. Maybe developers of autonomous systems like self-driving cars could learn something from the nature of human consciousness.
One stable perception at time
Give it a bit of thought and you’ll see that we have a stable awareness of just one thing at a time. Reading comes one chunk of meaning at a time. We can’t listen to the radio while we’re reading. Pausing, we may turn our attention to the background music, but the sound was excluded from awareness while engaged with the text.
The brain is processing ambient sound all the while; we are just not attending to it, so it is not presented in awareness. If among the sound is a doorbell ringing, brain systems outside of awareness signal the system controlling the flow of sensory information to switch from reading to listening to the environment. We become aware, a bit after the fact, that the doorbell rang. If it rings again, it’s less of an echo of a recent event and more present in awareness.
Even within a single sensory channel, we perceive only one coherent view of the world at a time. My favorite example are these bistable visual figures.

These are two classic examples. You see either the rabbit or the duck, never both at the same time. Similarly, you see the vase or the faces. And there’s something about these images that makes them unstable. As the eye wanders over the image, we see one interpretation than the other. In my experience,the harder I try seeing both, the faster the perception flips. I really enjoy experiencing my brain switch between the two ways of interpreting the image in real time— back and forth like a tennis match. Why can’t we see both aspects of the figures at the same time? Somewhere in between the two states or both states at once? I’m experiencing my brain in action.
Nevertheless, having one stable perception at a time is a basic operating principle of our awareness. Our brain is actually multitasking in real time. All of our senses are processing incoming flow, and our motor system is pushing out commands. Yes, we can walk and talk at the same time. But we can be aware of generating speech or we can be aware of watching our step, but not both simultaneously, even as both are ongoing and uninterrupted. And even as the visual system is decoding both the rabbit and duck, it only presents one at a time. We are aware of the current state of the brain’s model of the world, its prediction of what’s out there now.
Being a Good Regulator and having a model
Let’s start with the basic idea that the brain models the body in its environment. This is the Good Regulator Theorem of Conant and Ashby. The idea is that if a control system is going to be able to regulate a system, it has to internally represent the system.
Think about a classic thermostat using a bimetalic strip to set the temperature. The strip has steel and copper bound on each side of the strip. Copper expands a lot as temperature changes, steel not so much. So as the temperature changes, the bar bends a bit. The movement closes or opens an electrical circuit, triggering the heat on or off. By moving the thermostat dial, you change the amount of bend that turns the heat on or off.
In effect, the strip is a model of the room temperature: the degree of bend corresponds to the temperature. The dial is a model of the desired temperature. This is a totally analog, of course. This simple principle, that the control system contains a model of the system being controlled, extends to more complicated robotics as well as biology.
Error detection guides action
A single-cell organism needs to move toward a food source. So it actually needs a model of the gradient of an energy source like sugar. In E. coli, for example, the bacteria has two ways of swimming, a “run” in a straight line, or a “tumble” where it reorients to a random direction. It actually makes a running comparison of the concentration of food sensed over time. Technically, it’s a leaky integrator, computing a smoothed difference over time. If the comparison shows an increase, the “run” increases in length. If it’s going down, its “tumble” rate increases. It’s a simple biochemical implementation of short-term memory. And as a good regulator, it models the external food concentration gradient, allowing it to seek out food.
In the simple case of the thermostat, the regulator is sensing a deviation from the desired condition when the strip bends enough to turn the heat on. It’s too cold, so the heat comes on until the strip bends enough to open the circuit and the heat turns off. The E. coli regulator detects an error when the food gradient is not increasing. It tumbles and runs, tumbles and runs until the integrator says it’s moving in the right direction, toward the source of sugar. If the concentration stops increasing, it’s the wrong direction, and it’s time to tumble and run to find the right way.
Critically, you can’t point to the model of the environment— room temperature or food gradient— in either case. Conant and Ashby pointed out that it must be there to be a goal-seeking system.
A self-aware model
The brain is just a very fancy control system. It detects low blood sugar, seeks a snack. The E. coli does that. We just have a complex nervous system that uses credit cards to buy a Snickers bar at the gas station. It sends us to work to earn exchange credits to pay the credit card bill. There’s a model of the self in the environment and a good regulator; the brain is there to minimize error signals in all of its systems.
The mammalian brain has a unique feature we don’t see in thermostats or bacteria. It is self-aware. We don’t understand how, but the thalamocortical network gives rise to subjective experience. We are provided a view of the model of self in the environment for use in directing attention and dealing with the environment and the results of our actions. As I pointed out earlier, the awareness is limited to a single stable perception at a time. That’s because the network is working to minimize error just like the thermostat and the E. coli. So we experience the current prediction of what’s out there. It’s a best guess and subject to change based on new data. For these bistable images, just the eye wandering around the image causes the prediction to flip from one stable state to another.
Karl Friston’s Free Energy Principle
This, simply put, was what Karl Friston called the Free Energy Principle. His language is admittedly confusing, but he simply borrowed terminology and math from information theory to formalize this idea of error correction based on deviation from the model’s expected state.
Friston used a term from information theory, “surprise,” to quantify how new sensory information deviates from the predicted environment. In our terminology, we’ve established through the Good Regulator Theorem that the brain has a model of the system, the self in the environment. Like the thermostat and the E. coli, but one that we are aware of as our self in an environment “out there”. As we get new information, in order to make sense of it, it has to be fit to the model, minimizing error— here surprise. Prediction isn’t about the future, the brain is predicting what’s outside.
When we present the brain with ambiguous input like the vase or the rabbit/duck image, the brain can’t represent both simultaneously. The neuronal networks that constitute the model have to settle into the state that minimizes error. Ambiguity is unstable; the brain resolves it by choosing the interpretation that best explains the input, taking away the surprise. At least temporarily in this case, because we experience the brain flipping back and forth between the two possibilities. I love this direct experience of brain mechanisms in action at a level normally outside of awareness.
The connection between Bayesian inference and the model
Friston borrowed the term “free energy” from information theory. There’s a branch of statistics grounded in Bayesian inference, variational inference, where free energy is used to describe how closely a mathematical model matches a target. It would take way too much space here to explain the statistical concepts, but suffice it to say it’s used all the time in Machine Learning. The idea is simple. Minimizing variational free energy just means tightening the fit between your model and the world, using error signals to update internal beliefs.
Brilliantly, Friston saw the connection between information theory and the kind of predictive error correction we’ve been talking about. But not in teaching an AI to write or a robot to dance, just to allow the brain to control the body in its environment. And ultimately to buy Snickers bars. Technically though, all the neuronal network is doing is to instantiate an implicit generative model of the world and that’s constantly updated to reduce the gap between what it has modeled is out there and new information.
Our best guess is how we stay sane
We see why we experience only one thing at a time. We can’t see the rabbit and the duck simultaneously because the brain is sampling a unified generative model of the world that has to be coherent. There’s no room for ambiguity and multiple competing realities, even multiple competing sensory channels for awareness. It’s one percept at a time, its best guess about the aspect of the self or environment currently being examined. We see this as we eavesdrop on the brain with EEG or fMRI. Ambiguity or even multichannel crosstalk would threaten the coherent model.
A brain charged with steering basar v’dam as we say in hebrew—mere meat and blood—through an uncertain world needs a single, coherent model grounded in its best-guess prediction of what lies beyond the skull. When that model frays, we glimpse the price of incoherence: misperceptions of faces in the dark, disembodied voices that hijack thought. Psychosis, and perhaps the social withdrawal seen in autism, show what happens when the world-model loses its predictive value, an idea I’ll explore in a later essay. For now the takeaway is stark: our single-threaded awareness isn’t an evolutionary quirk but the mechanism that keeps perception, action, and identity aligned. The brain models and presents one perception at a time because anything less can be, quite literally, madness.
Prefer to follow via Substack? You can read this and future posts (and leave comments) at On Deciding… Better on Substack: Brain, Self, and Mind
© 2025 James Vornov MD, PhD. This content is freely shareable with attribution. Please link to this page if quoting.