category theory boosts language model perplexity

source: arxiv artificial intelligence: the cognitive categorical transformer: category-theoretic inductive biases for language modeling

level: research

the cognitive categorical transformer (cct) is a 306m-parameter model that adds category-theoretic components to a pretrained gpt-2 small backbone. it uses simplicial message passing, a method inspired by cognitive science, to process information. in a controlled test on wikitext-103 with 215,000 optimizer steps, cct achieved a validation perplexity of 21.27. the baseline gpt-2 small, fine-tuned identically, reached 24.19. this is a 2.92 point reduction, or a 12% relative improvement.

an ablation study removed the simplicial message passing component, called gt-full, during the entire seven-phase activation schedule. this retrained model scored 23.72 perplexity. the difference accounts for 2.45 of the 2.92 point gain, meaning 84% of the improvement comes from gt-full. the result provides the first ablation-validated evidence that simplicial message passing directly improves language model perplexity.

the work shows that category theory can offer practical inductive biases for neural networks. by structuring how information flows between parts of the model, cct learns more efficiently from the same data and training budget. the approach does not require more parameters or steps, just a different internal wiring. this opens a path for using abstract mathematical structures to build better language models without scaling up compute.

why it matters: it demonstrates a new way to improve language models using mathematical structure instead of more data or parameters, which could lead to more efficient training.

source: arxiv artificial intelligence: the cognitive categorical transformer: category-theoretic inductive biases for language modeling