Formal language theory is the study of syntactic structures defined over finite alphabets using precise mathematical rules. Originating in the work of Noam Chomsky in the 1950s and refined by automata theorists such as Michael Rabin and Dana Scott, the field provides a hierarchy of language classes — regular, context-free, context-sensitive, and recursively enumerable — each with increasing generative power. These classes correspond directly to different types of automata, establishing a deep connection between grammar and computation that underpins all of computational linguistics.
The Chomsky Hierarchy
The Chomsky hierarchy classifies formal grammars into four types based on the form of their production rules. A grammar G = (V, Σ, R, S) consists of a set of non-terminal symbols V, a set of terminal symbols Σ, a set of production rules R, and a start symbol S. The four types are distinguished by restrictions on the production rules:
Type 1 (Context-Sensitive): αAβ → αγβ where |γ| ≥ 1
Type 2 (Context-Free): A → γ
Type 3 (Regular): A → aB or A → a
Each level strictly includes the one below it: every regular language is context-free, every context-free language is context-sensitive, and every context-sensitive language is recursively enumerable. This containment hierarchy has profound implications for natural language processing, since the class of grammar needed to describe a linguistic phenomenon determines the computational resources required to parse it.
Applications to Natural Language
A central question in computational linguistics is where natural language falls in the Chomsky hierarchy. For decades, the prevailing assumption was that context-free grammars suffice for natural language syntax. However, constructions such as cross-serial dependencies in Swiss German and reduplication in Bambara demonstrate that natural languages are not strictly context-free. These findings motivated the development of mildly context-sensitive formalisms — grammars that go slightly beyond context-free power while remaining efficiently parsable.
Aravind Joshi proposed the notion of mild context-sensitivity to characterize the minimal extension beyond context-free power needed for natural language. Mildly context-sensitive languages include limited cross-serial dependencies, are parsable in polynomial time, and have the constant growth property. Tree-adjoining grammars and combinatory categorial grammars fall into this class.
Formal Definitions and Closure Properties
A formal language L over alphabet Σ is any subset of Σ*, the set of all finite strings over Σ. Language classes are characterized not only by their grammars but also by their closure properties — whether they are closed under union, intersection, concatenation, complementation, and Kleene star. Regular languages enjoy closure under all Boolean operations and concatenation, while context-free languages are closed under union and concatenation but not under intersection or complementation. These closure properties have practical consequences for the compositionality of linguistic descriptions.
The study of formal languages continues to evolve. Recent work connects formal language theory to neural network expressiveness, asking which classes of formal languages can be recognized by various neural architectures such as recurrent networks and transformers. This line of research promises to deepen our understanding of both the capabilities and limitations of modern NLP systems.