The Pathway of Human Language Towards Computational Precision in LLMs
Regularity and irregularity. Decodable and tricky words. Learnability and surprisal. Predictability and randomness. Low entropy and high entropy.
Why do such tensions exist in human language? And in our AI tools developed to both create code and use natural language, how can the precision required for computation co-exist alongside this necessary complexity and messiness of our human language?
An Algebraic Symphony of Language and Meaning
In our last post, we examined how there is a statistical and algebraic nature to language that drives the power of LLMs, and that the form and meaning of a language may be much more intertwined than we assume, given just how much meaning LLMs are able to approximate via computation of statistical arrays alone.
This interlacement of form and meaning is in relation to where and how words show up repeatedly in sentences and texts, not simply in the form of words themselves. Because all languages contain words that have the same form but different meanings. Some words that share the same form have entirely unrelated meanings (homophony), while other words with the same form have closely related meanings (polysemy). Yet LLMs are able to use them in a more or less “natural” manner due to the high dimensional mappings of word parts in statistical relation to one another – such that word analogies can be calculated mathematically:
“For example, Google researchers took the vector for biggest, subtracted big, and added small. The word closest to the resulting vector was smallest.”
–Large language models, explained with a minimum of math and jargon, Timothy Lee & Sean Trott
That the algebraic and statistical relationship of words in natural language can drive computational models' generative capabilities suggests that language itself may reflect the limitations and potential of AI. And the thing with natural, human language is that while it is endlessly generative, it also tends to be imprecise. For our human usage, gestures and the context of our social interaction, who and when we are speaking to, plays a big role. As long as we get our main message across, we’re good.
Human language is fundamentally communicative and social, and there’s feelings involved.
The Imprecision of Human Expression
Imagine yourself in a bustling restaurant in an international airport, surrounded by people from diverse linguistic backgrounds. You're trying to communicate with a traveller whose language you don't speak. What do you do?
You resort to body language. You gesture hyperbolically and make exaggerated facial expressions. You point to objects, mime actions, and mouth simple words you hope the other person might use as a basis for basic understanding.
Depth, nuance, and complexity are not possible (beyond each individual’s imagination) in this most elemental of interactions.
So what is required for depth, nuance, and complexity?
A shared language, whether spoken, written, or signed. In which a small set of sounds, letters, or signs are concatenated in a wide assortment of ways, both commonplace and surprising, to convey a wide assortment of ideas and feelings.
Yet a shared language, while providing a platform for greater depth, may still remain imprecise. What is meant to be conveyed is not always exactly what is understood.
There are furthermore gradations of precision in language, beginning with the ephemeral and contextual nature of spoken and signed language, moving into the more ossified form of written language, in which spelling must be exact and word selection must be more intentional. There is also a movement from the language we use with our family, with frequent, commonly used words, to the language we use when writing an academic paper, with domain-specific, rarer words. In education, we often refer to this type of language as Tier 2 or 3 vocabulary.
If a person is equipped with more of that academic, domain-specific language, then greater precision in communication can be achieved. Yet the challenge of whether the listener hears and interprets what is intended remains. For example, in this article in Scientific American, “People Have Very Different Understandings of Even the Simplest Words”, they discuss how the more abstract a word, the more it can be tied to an emotional valence and someone’s identity and experiences, rather than a precise meaning.
The Computational Imperative
But in some ways, this inherent fuzziness of our language may be a feature, rather than a bug. It gives us a complex adaptive system for navigating, creating, and communicating in a world of complex adaptive systems.
For computers and computations, however, exactness and precision in language is required – either a line of code input runs the correct function as an output or it doesn’t. So it’s quite interesting that one of the most immediately powerful use cases so far of LLMs seems to be as a natural language interface to develop and review code.
Stephen Wolfram, in a long and interesting explainer on how LLMs work, “What Is ChatGPT Doing … and Why Does It Work?”, explores some of this tension between computational and natural language.
“Human language is fundamentally imprecise, not least because it isn’t “tethered” to a specific computational implementation, and its meaning is basically defined just by a “social contract” between its users. But computational language, by its nature, has a certain fundamental precision—because in the end what it specifies can always be “unambiguously executed on a computer”. Human language can usually get away with a certain vagueness. (When we say “planet” does it include exoplanets or not, etc.?) But in computational language we have to be precise and clear about all the distinctions we’re making.”
Computational Irreducibility and the Limits of Predictability and Learning
One of the limitations Wolfram raises between human and computational language is what he terms “computational irreducibility,” a term he uses to describe the difficulty in making accurate predictions for a highly complex system, such as for weather or climate systems. For such systems, it would require performing step-by-step computation based on an initial state, and thus can’t be swiftly calculated by compressing data.
In some ways, this “compression” of information is what we are doing with language as we use more “Tier 2” and “Tier 3” – or academic – words in our speech or writing. There is a greater density of information provided in academic speech and writing, in which more abstract words are used to convey complex concepts, and our sentences tend to become more compound and complex. The simpler, more frequent words, phrases, and sentences we use in our everyday speech are more regular and thus, more learnable.
. . . there’s just a fundamental tension between learnability and computational irreducibility. Learning involves in effect compressing data by leveraging regularities. But computational irreducibility implies that ultimately there’s a limit to what regularities there may be.
. . . there’s an ultimate tradeoff between capability and trainability: the more you want a system to make “true use” of its computational capabilities, the more it’s going to show computational irreducibility, and the less it’s going to be trainable. And the more it’s fundamentally trainable, the less it’s going to be able to do sophisticated computation.
Irregularity and Regularity in Language
What’s interesting to note here is that all languages have constructive tensions between regularity and irregularity. This tension may be a process of language being honed over time to be more learnable within our cognitive constraints. We’ve explored some of this before in our post, Irregularity Enhances Learning (Maybe), in which we examined a paper by Michael Ramscar that suggested there is some level of tension between language forms that show up again and again, and the language forms that are more infrequent, but thus inherently gain more of our attention. This relates to the theory of “statistical learning” with which we not only learn language, but also when we map a language to its written form.
For Wolfram, that LLMs are as powerful as they are suggests that human language is actually more statistically regular than we may have thought:
“my strong suspicion is that the success of ChatGPT implicitly reveals an important “scientific” fact: that there’s actually a lot more structure and simplicity to meaningful human language than we ever knew—and that in the end there may be even fairly simple rules that describe how such language can be put together.”
And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.
In other words, the reason a neural net can be successful in writing an essay is because writing an essay turns out to be a “computationally shallower” problem than we thought. And in a sense this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language.
And so thus far the unrealized potential, for Wolfram, is that with a greater underlying capability in AI for computational language, it may be able to bridge our more “computationally shallow” human language with the precision required for more complex computations:
”its very success gives us a reason to think that it’s going to be feasible to construct something more complete in computational language form. And, unlike what we’ve so far figured out about the innards of ChatGPT, we can expect to design the computational language so that it’s readily understandable to humans.”
Decontextualized Language: The Pathway to Precision
On this pathway towards integration of human language and computational language, it’s interesting to consider how in our own language development, we are able to better “compress information” and develop greater precision in our thinking and communication as we learn and incorporate rarer and more abstract language into our own. We’ve spoken before about “decontextualized language” – the language that takes us beyond the immediate context and moment, and how such language can take us beyond our own delimited feelings and experiences, and into a realm of interpersonal and cultural thought, knowledge, and perspectives. This is the language of storybooks, of science, and – at it’s greatest extreme – of code. We begin teaching this form of language when we engage in storytelling with our children and reading with them and talking to them about books. It becomes increasingly dense and complex as we move into disciplinary study.
There is some evidence that training LLMs on this specific form of language is more powerful – such as this study training a “tiny LLM” on children’s stories. And if you think about what LLMs have been getting trained on thus far – it’s a corpus of written language, not training on conversations using everyday language. As we’ve explored in depth on this blog, written language is not synonymous with oral language – by nature of it being written, it is already more “decontextualized,” and requires more inference and perspective-taking. That LLMs are trained on such a corpus may be, in fact, why their algebraic and statistical magic can be so surprisingly powerful. There is a greater density of information in the written forms of our languages.
Implications for Teaching and Learning
What might all of this say about teaching and learning? Well, so far, one of the facets we’ve highlighted from LLMs is that the statistical nature of language alone can take us pretty far, which suggests that alongside of social interaction and peer engagement and communication, we want to increase the volume of that language exposure and use. And in terms of the nature of the language we want to increase: the more that the form of that language combines precision with abstraction, the greater computational power it can provide. Turning up the dial on decontextualized language use and exposure – in other words – providing our children with “textual feasts,” to use Alfred Tatum’s term, may be the key to enhanced learning.
Sources for Further Exploration
If you are interested in further exploring some of the tensions we began this post with – between regularity and irregularity in language, here’s some further interesting reads to geek out on:
- “Source codes in human communication” by Michael Ramscar
- “Expectation-based syntactic comprehension” by Roger Levy
- “Cognitive approaches to uniformity and variability in morphology” by Petar Milin, Neil Bermel, and James Blevins
#language #computation #algorithms #learning #LLMs #cognition