There’s nothing wrong with your homunculus— there’s just no homunculus looking at your homunculus

The mind arises from a collection of many maps, all working coherently to provide a model of the self in the environment. But it is the maps, no one is looking.

What is a homunculus anyway?

The term homunculus literally means “little man” in Latin. Medieval alchemists thought there was a little, fully formed human in sperm that would implant and grow in the womb. The mother was just an incubator for a preformed human.

The term homunculus to describe the somatotopic map of the body laid out across the motor and sensory cortex of the brain was popularized by Wilder Penfield, one of the pioneering neurosurgeons of the 20th century. Using electrical stimulation during awake brain surgery, Penfield identified how specific regions of the precentral and postcentral gyri corresponded to distinct parts of the body. Penfield created these maps by directly stimulating the cortex during awake neurosurgery for epilepsy. As he stimulated, the patient either reported sensation in some part of the body or involuntary movement in the form of evoked twitches or jerks —creating a distorted but systematic representation that came to be visualized as a “little man” stretched across the cortical surface.

Cortical maps

The idea that the cortex was organized as a series of maps was, of course, not new. Of course, the ideas of cortical maps were not new. At the dawn of neurology and neuroscience in the 1870s, people like David Ferrier stimulated and lesioned monkey cortex and established the mapping of the “motor centers”. Hulings Jackson noticed that motor seizures progressively spread across body parts in a clear somatotopic pattern— leading him to infer an organized map in the cortex as Ferrier had shown.

At the same time, those studying sensory systems also realized the brain mapped the sensory environment. Evidence accumulated that the retina’s visual fields were mapped onto the visual cortex. It was inferred first from lesion studies but with the development of electrophysiological recording, the spatial organization of the visual cortex became clear. And we all know it was Hubel and Wiesel in the 1950s that not only showed that the map of visual field was distorted like the body maps. The fovea with its high density of color-sensitive photoreceptors was given more area than the visual periphery. Hubel and Wiesel also showed that there were parallel maps overlayed in V1 for processing of binocular disparity, providing the basis for depth perception.

https://www.researchgate.net/publication/363073241_Hitting_the_Right_Note

In the 1970s, it became apparent that there were often duplicated adjacent maps. In the visual system,there were secondary maps that preserved the retinotopic map, but laid out in stripes for color, motion, binocular disparity, form, or orientation. So in V2, we get sensitivity to figure-ground separation, border ownership, and contours. And there’s more! V3 has maps that infer changing shape over time. V4 appears to primarily process color and shade. And MT (V5 sometimes), which is highly specialized for motion perception.

But no theater of the mind

No matter how hard neuroscientists looked, all they found was parcelling out of visual features into spatial maps. One would think that visual features would be extracted and then assembled into the visual world for use by the rest of the brain. That never happens. There are just these parallel maps, each showing a different feature. It’s like the layers you get on a mapping program. There’s the street layer. There’s the topographic layer, there’s the traffic layer, there’s the geological history layer. And you can examine each one by itself or in combination.

But they never come together. They’re never stacked and displayed. Because that would need some other area of cortex to look at the assembled map. That area doesn’t exist.

There is no theater of the mind. There is no homunculus looking at the visual map. No one accesses the motor homunculus map to activate muscles of the neck to look at what just landed on the leg. We find modular, interconnected parallel maps, each seemingly specialized to infer particular features about the world, like color or motion or motor program execution.

Auditory and language maps

Let’s look at the auditory system and language. In the primary auditory cortex, there’s a tonotopic map where frequency is arranged systematically from high to low, just like in the cochlea. But just like in the visual system, it’s not a representation of sound like a recording, it’s encoding by frequency, thinklike onset timing, frequency modulation. So just like in the visual system, V1 is responding to edges in visual space, A1 is responding to the corresponding edges in frequency-time space. It’s change and modulation in sound that carries the meaning, not the tones themselves. And like in the visual system, there are higher order tonotopic maps responding to broader, more coherent changes.

Until you get to Wernicke’s area, now you might have heard of this area of the temporal lobe, right next to the auditory areas. It is known as the reception area for language. But that would, once again, be a misguided belief in a homunculus that decodes language for someone in the brain to understand. Wernicke’s area is specialized to turn sound into phonemes, the meaningful sound components of spoken language. It’s not decoding the meaning of words, it’s just detecting structure in the sound stream so other areas doing actual semantic work can use the language stream for their own specialized tasks. Wernicke’s area can’t know whether a set of syllables is babble or Shakespeare.

All through these sensory streams, there is input from other areas communicating the expectation of the state of the world and what sensory input ought to look like. For the visual system, that’s a stable, sensible real environment. No ambiguity, no doubt. The process is there to minimize error, the difference between the incoming sensory flow and the prediction of the state of the world.

Now for language, that’s next word prediction. Yes, just like our Large Language Models, we decode the speech sounds into meaning by following the thread and having a very good idea of what word, set of syllables comes next. Or when the sound does arrive, if not what’s expected, the model adjusts. And that’s the language humonculus in action. That voice in your head isn’t some little man, it’s the maps of sound, syllables, and semantic flow of language in parallel creating meaning.

Searle’s Chinese Room is the brain

At this point, I can’t help but bring up Searle’s famous Chinese Room thought experiment. Searle asked us to imagine a person in a room who receives messages in Chinese. He needs to translate and send the result through the output slot, but the problem is he doesn’t know a word of the Chinese language! So what he has to do is quickly consult a big rulebook that transforms the Chinese message into an English message. He does this perfectly without ever understanding anything about what he’s doing.

To an outside observer, it looks like there’s a skilled translator in the room, fluent in Chinese and English, but in truth, inside, there’s just symbol manipulation. No meaning, no awareness. Searle used this example to argue that syntax isn’t sufficient for understanding. Meaning is something beyond symbol manipulation.

But what I’ve been arguing here is that what he really described, without meaning to, is exactly how the brain works. Meaning, agency, literature, and nonsense aren’t there in these maps; they are emergent from the parallel function of all the maps. But never are the maps overlaid for some homunculus to look at. Clearly, if there was a place where it came together in a theater of the mind, there would need to be something to be the audience. And that audience would need an audience. Etc.

No labels, no structure, no homunculus

And just like LLMs, we can see the maps close to the inputs and outputs, in primary sensory and motor cortex. We see what second-order areas are responding to like form, color, or phonemes. Once we get into semantics, memory, and face recognition, we have a good idea of what brain areas are involved, but the maps have no obvious structure. Just like you can’t see language deep into the nodes of an LLM, you can’t see memory or meaning in the brain.

So in one sense, there is the homunculus described by Penfield-modality-specific maps that must also exist for higher-order functions like playing tennis or performing neurosurgery. But in the other sense, there is no homunculus in the sense of a little man in the skull that is the viewer of brain activity.

Author: James Vornov

I'm an MD, PhD Neurologist who left a successful academic career on the Faculty of The Johns Hopkins Medical School to develop new treatments in Biotech and Pharma. I became fascinated with how people actually make decisions based on the science of decision theory and emerging understanding of how the brain works to make decisions. My passion now is this deep explanation of what has been the realm of philosophy, psychology and self help but is now understood as brain function. By understanding our brains, I believe we can become happier, more successful people.