The Tip of the Tongue and the Limits of Autoregressive Cognition
Why Tacit Knowledge Challenges a Generative Model of Mind
There is a peculiar moment, familiar to us all, when a name hovers just beyond reach—on the tip of the tongue. You know it. You feel its presence. You might even recall the first letter, the number of syllables, or the movie in which you last saw the actor. Yet, despite this rich constellation of clues, the word itself refuses to emerge. This is not mere forgetfulness. It is a profound window into the architecture of the mind—one that reveals the limits of a purely sequential, language-like model of cognition.
In a recent discussion, Professor Emmanuel Barenholtz presents a provocative thesis, starting with a focused claim before expanding it into a grander vision of the mind. His core argument begins with the assertion that human language is autoregressive and does not require symbol grounding. Challenging a cornerstone of cognitive science, he suggests that meaning arises not from words being linked to sensory, real-world experiences, but from the vast network of statistical relationships between the words themselves. The astonishing abilities of Large Language Models (LLMs), which learn from text alone yet produce coherent prose, serve as his primary exhibit for this possibility.
But Barenholtz takes this idea a significant step further, speculating that this generative, sequential process is not limited to language but is the fundamental operating principle of the mind itself. He posits that cognition itself is autoregressive. “I do want to just… speculate that cognition more generally is autoregressive in this way,” he says (1:48:55). For Barenholtz, the mind is not retrieving static memories from a database but is generating cognition—like a story being told in real time. The conversation we are having now, he argues, is the autoregressive workspace: a dynamic, unfolding process where meaning emerges from the trajectory of thought.
This expanded view is compelling, especially when applied to inner speech or planning. Barenholtz is careful to clarify his level of analysis: “I'm not claiming we are GPT… What I'm claiming is that the fundamental math… matrix multiply, vector times matrix, autoregress, do it again—that level of abstraction… is accurate” (1:29:09). He is not arguing for a silicon brain, but for a functional parallel: at its core, cognition behaves like a generative, self-conditioned process.
Yet, it is this very extension—from a model of language to a model of all cognition—that the tip-of-the-tongue (TOT) phenomenon challenges in a subtle but decisive way. If all of cognition were purely autoregressive, then the failure to produce a word would simply be a breakdown in the generative chain. But the TOT state is not just a failure to generate; it is a metacognitive awareness of a missing piece. You don’t just fail—you know you’re failing. This state is not a void but a palpable presence, a feeling of acute tension between a clearly apprehended meaning and a stubbornly inaccessible word.
Here, a purely generative model falls short. A proponent might argue that this "feeling" of knowing is itself a complex generated state—a high-probability activation in a latent space that simply fails to surface a specific token. But this reduces a profound qualitative experience to a statistical ghost. The feeling is not of a probable word, but of a correct one. It is here that the autoregressive model, when stretched to cover all of cognition, reveals its limitations.
This is where Michael Polanyi’s concept of tacit knowledge becomes essential. In The Tacit Dimension, Polanyi writes: “We know more than we can tell.” He argues that all knowing has a structure of subsidiary and focal awareness. We attend from subsidiary clues to a focal object. In the TOT state, you are attending from a rich network of subsidiary cues—face, voice, context, emotional tone—to a name that remains an out-of-reach focal point. The knowledge is not absent; it is tacit, embodied in a web of associations that has not yet coalesced into explicit, speakable form.
Polanyi’s insight aligns with the Gestalt tradition in psychology, which insists that perception and cognition are fundamentally holistic. We do not recognize a face by assembling features—nose, eyes, mouth—one by one in a sequence. We perceive the whole instantly, as a meaningful configuration. As the foundational tenet of Gestalt psychology states, “The whole is other than the sum of its parts.” Similarly, in the TOT state, the mind is not attempting to build the name from phonemes; it is resonating with a complete pattern that is almost, but not quite, brought into focus.
This kind of resonant, holistic cognition finds a powerful neuroscientific counterpart in Iain McGilchrist’s work on the divided brain. In The Master and His Emissary, McGilchrist argues that the two hemispheres of the brain engage with the world in fundamentally different ways. The left hemisphere (LH) is detail-oriented, analytical, and sequential. It deconstructs the world into parts, manipulates symbols, and constructs explicit narratives.
The right hemisphere (RH), however, is the "Master" in McGilchrist's metaphor. It sees the whole, grasps context, and dwells in the lived, embodied world. It understands metaphor, irony, and the implicit meaning in a friend's smile—things that cannot be built step-by-step. The RH is not concerned with generating the next word, but with being present to the fullness of experience.
Barenholtz’s autoregressive model of cognition maps almost perfectly onto the left-hemisphere's mode of being. It privileges sequence over simultaneity, generation over recognition, and explicit language over silent understanding. To claim that all cognition is autoregressive is to elevate the brain's narrative-producing "Emissary" to the status of the entire mind, while marginalizing the silent, holistic, embodied knowing of the "Master." The TOT state, then, can be seen as a moment of tension: the RH knows the whole, while the LH's generative faculty fails to produce the token.
Consider other clear examples of non-autoregressive cognition. You hear the first few notes of a melody and instantly recognize the entire song, a gestalt recognition that precedes any note-by-note analysis. You enter a room and immediately sense the mood, not by reasoning from individual cues, but through a global affective resonance. You ride a bicycle, your body making thousands of micro-adjustments in a seamless, flowing action, a form of intelligence entirely divorced from an inner monologue guiding each pedal stroke. These are not failures of autoregression; they are expressions of a different kind of intelligence—one that is distributed, parallel, and tacit.
Even Barenholtz acknowledges this implicitly. He notes that “language is able to hand off to the perceptual system” (16:05), and that “animals understand doors and can reason about them without language” (29:01). These are crucial admissions that cognition extends far beyond the autoregressive workspace. They point to a mind that is not a monolithic language engine, but a multi-modal system, capable of both linear generation and holistic recognition, both articulation and resonance.
In the end, the tip-of-the-tongue phenomenon is not a broken chain. It is a revelation. It shows us that we can know without saying, that meaning can be felt before it is formed. The ultimate irony is that Barenholtz himself seems to grasp this, ending his discussion with a nod to the ineffable. "The ineffable is real and central," he states (2:21:52). This is a profound admission of a reality that lies just beyond the grasp of the very token-by-token process he champions—a world the mind can know, even if the storyteller cannot always tell. To reduce cognition to autoregression is to ignore the silent majority of mental life. The mind is not just a storyteller; it is a participant in a world far richer than any narrative can capture.
References
Barenholtz, E. (2024). Cognition as Autoregression [YouTube video]. Glasp.
Polanyi, M. (1966). The Tacit Dimension. University of Chicago Press.
McGilchrist, I. (2009). The Master and His Emissary: The Divided Brain and the Making of the Western World. Yale University Press.
Excellent and thought-provoking. Perhaps this explains why we value art. Why is Van Gogh more revered (at this time at least) than Rembrandt, the ultimate symbolic realist? Although, at the end, he even veered off into semi-impressionist directions. And Picasso? While not a neuroscientist, he understood how to elicit tacit knowledge in viewers.
As one who studies information in all its forms, speech and linguistics are just instances of millions of information processing systems created by life. What are the universal structures and semantics that underlie all the diverse multidimensional information systems created by biology?
Thank you for mentioning Glasp (glasp.co)!