Linguistic theory, psycholinguistics and large language models
Stela Manova
May 2024
 

Recently much attention has been paid to whether large language models (LLMs) can serve as theories of language (Piantadosi 2023 and replies to other scholars in it). Unfortunately, the discussion has been kept at an abstract level and virtually nothing has been said about how LLMs work technically and what their internal organization means for linguistic theory (LT). My research fills this gap. Since algorithms in different LLMs may differ, I focus on ChatGPT. ChatGPT has a vocabulary of 100k tokens. Tokenization makes possible the representation of a large amount of text with a small set of subword units (tokens). Most of the tokens coincide with linguistic units and are letters (phonemes), morphemes or words. ChatGPT seems to combine major claims of major linguistic theories, as well as major findings of research in psycholinguistics. The most significant difference between LT and ChatGPT consists in the fact that LT is level-based, in the sense that phonology manipulates phonemes, morphology manipulates morphemes and syntax works with words (and morphemes in Distributed Morphology). The order of phonology, morphology and syntax in the architecture of the grammar is theory-dependent: Phonology and morphology may precede or follow syntax. By contrast, ChatGPT works with linear sequences of tokens and phonology, morphology and syntax take place simultaneously. In other words, ChatGPT elevates phonology and morphology to the level of syntax. This is research in progress. Comments and suggestions are welcome!
Format: [ pdf ]
Reference: lingbuzz/008123
(please use that when you cite this article)
Published in: 57th Annual Meeting of the Societas Linguistica Europaea
keywords: natural language processing, large language models, linguistic theory, psycholinguistics, phonology/morphology/syntax, syntax, phonology, semantics, morphology
previous versions: v2 [May 2024]
v1 [May 2024]
Downloaded:1433 times

 

[ edit this article | back to article list ]