Computational Linguistics
About

Context-Free Grammar

Context-free grammars are the most widely used formalism for specifying the phrase structure of natural and programming languages, providing the foundation for syntactic parsing and grammar engineering.

G = (V, Σ, R, S) with rules A → α

A context-free grammar (CFG) is a formal system for generating strings by recursive substitution of non-terminal symbols. Each production rule rewrites a single non-terminal into a sequence of terminals and non-terminals, without regard to surrounding context. CFGs are the workhorse of syntactic description: they model the hierarchical constituent structure of sentences, capture recursive embedding, and form the basis of virtually every parsing algorithm used in computational linguistics.

Formal Structure

A CFG is a 4-tuple G = (V, Σ, R, S), where V is a finite set of non-terminal symbols representing syntactic categories, Σ is a finite set of terminal symbols representing words, R is a finite set of production rules of the form A → α where A ∈ V and α ∈ (V ∪ Σ)*, and S ∈ V is the start symbol. A derivation begins at S and repeatedly applies rules until only terminal symbols remain. The language L(G) is the set of all terminal strings derivable from S.

Example Grammar for English Fragment S → NP VP
NP → Det N | NP PP
VP → V NP | VP PP
PP → P NP
Det → "the" | "a"
N → "dog" | "cat" | "park"
V → "saw" | "chased"
P → "in" | "with"

Normal Forms

Every CFG can be converted to Chomsky Normal Form (CNF), where every rule is either A → BC (two non-terminals) or A → a (one terminal). CNF is required by the CYK parsing algorithm and simplifies many theoretical analyses. Similarly, Greibach Normal Form requires that every rule begin with a terminal symbol, which facilitates top-down parsing. These normal forms do not change the generated language (except possibly for the empty string) and demonstrate that the essential power of CFGs resides in their recursive structure rather than in the specific form of rules.

Probabilistic Context-Free Grammars

A probabilistic CFG (PCFG) augments each production rule with a probability, such that the probabilities of all rules expanding the same non-terminal sum to one. PCFGs assign a probability to each parse tree as the product of the rule probabilities used, enabling disambiguation by selecting the most probable parse. The inside-outside algorithm (an instance of EM) estimates rule probabilities from treebanks or unannotated text. PCFGs were the dominant parsing model in statistical NLP throughout the 1990s and 2000s.

Limitations for Natural Language

While CFGs capture the recursive, hierarchical nature of natural language phrase structure, they have well-known limitations. Standard CFGs lack a natural mechanism for expressing lexical dependencies — the fact that the choice of verb largely determines the allowable argument structures. This led to the development of lexicalized CFGs, where non-terminals are annotated with head words, dramatically improving parsing accuracy. Furthermore, as noted by Shieber, certain natural language phenomena (cross-serial dependencies in Swiss German) lie beyond context-free power entirely.

Despite these limitations, CFGs remain central to computational linguistics. Modern neural parsers often still output constituent trees defined by an implicit or explicit CFG backbone. Grammar engineering frameworks like the LinGO Grammar Matrix and the XLE platform use feature-augmented CFGs as their syntactic core. The mathematical elegance, well-understood parsing algorithms, and broad applicability of context-free grammars ensure their continued relevance in both theoretical and applied linguistics.

Interactive Calculator

Enter grammar production rules, one per line, in the form A -> a B | b. The calculator classifies each rule within the Chomsky hierarchy (Type 0-3) and determines the overall grammar type.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics

References

  1. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2(3), 113–124. https://doi.org/10.1109/TIT.1956.1056813
  2. Jurafsky, D., & Martin, J. H. (2024). Speech and Language Processing (3rd ed.). https://web.stanford.edu/~jurafsky/slp3/
  3. Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press.
  4. Johnson, M. (1998). PCFG models of linguistic tree representations. Computational Linguistics, 24(4), 613–632. https://doi.org/10.5555/972764.972768

External Links