
The original version of this story appeared in Quanta Magazine.
Among the myriad abilities that humans possess, which ones are uniquely human? Language has been a top candidate at least since Aristotle, who wrote that humanity was “the animal that has language.” Even as large language models such as ChatGPT superficially replicate ordinary speech, researchers want to know if there are specific aspects of human language that simply have no parallels in the communication systems of other animals or artificially intelligent devices.
In particular, researchers have been exploring the extent to which language models can reason about language itself. For some in the linguistic community, language models not only don’t have reasoning abilities, they can’t. This view was summed up by Noam Chomsky, a prominent linguist, and two coauthors in 2023, when they wrote in The New York Times that “the correct explanations of language are complicated and cannot be learned just by marinating in big data.” AI models may be adept at using language, these researchers argued, but they’re not capable of analyzing language in a sophisticated way.
That view was challenged in a recent paper by Gašper Beguš, a linguist at the University of California, Berkeley; Maksymilian Dąbkowski, who recently received his doctorate in linguistics at Berkeley; and Ryan Rhodes of Rutgers University. The researchers put a number of large language models, or LLMs, through a gamut of linguistic tests—including, in one case, having the LLM generalize the rules of a made-up language. While most of the LLMs failed to parse linguistic rules in the way that humans are able to, one had impressive abilities that greatly exceeded expectations. It was able to analyze language in much the same way a graduate student in linguistics would—diagramming sentences, resolving multiple ambiguous meanings, and making use of complicated linguistic features such as recursion. This finding, Beguš said, “challenges our understanding of what AI can do.”
This new work is both timely and “very important,” said Tom McCoy, a computational linguist at Yale University who was not involved with the research. “As society becomes more dependent on this technology, it’s increasingly important to understand where it can succeed and where it can fail.” Linguistic analysis, he added, is the ideal test bed for evaluating the degree to which these language models can reason like humans.
Infinite Complexity
One challenge of giving language models a rigorous linguistic test is making sure they don’t already know the answers. These systems are typically trained on huge amounts of written information—not just the bulk of the internet, in dozens if not hundreds of languages, but also things like linguistics textbooks. The models could, in theory, simply memorize and regurgitate the information that they’ve been fed during training.
To avoid this, Beguš and his colleagues created a linguistic test in four parts. Three of the four parts involved asking the model to analyze specially crafted sentences using tree diagrams, which were first introduced in Chomsky’s landmark 1957 book, Syntactic Structures. These diagrams break sentences down into noun phrases and verb phrases and then further subdivide them into nouns, verbs, adjectives, adverbs, prepositions, conjunctions and so forth.
One part of the test focused on recursion—the ability to embed phrases within phrases. “The sky is blue” is a simple English sentence. “Jane said that the sky is blue” embeds the original sentence in a slightly more complex one. Importantly, this process of recursion can go on forever: “Maria wondered if Sam knew that Omar heard that Jane said that the sky is blue” is also a grammatically correct, if awkward, recursive sentence.
