Earley's algorithm, introduced by Jay Earley in 1970, is a chart-based parsing algorithm that handles any context-free grammar, including left-recursive, right-recursive, and ambiguous grammars, without requiring conversion to a normal form. It operates by maintaining a chart of partial parse states (called Earley items or dotted rules) organized by position in the input string. Each item records how much of a production rule has been recognized so far.
Earley Items and Operations
A → αβ is a production rule
The dot (•) separates recognized (α) from expected (β) symbols
i = start position, j = current position
Three operations:
PREDICTOR: [A → α • B β, i, j] ⇒ add [B → • γ, j, j] for all B → γ
SCANNER: [A → α • a β, i, j] and wj+1 = a ⇒ add [A → α a • β, i, j+1]
COMPLETER: [B → γ •, k, j] and [A → α • B β, i, k] ⇒ add [A → α B • β, i, j]
The algorithm processes the input left-to-right, maintaining a set of items (a chart column) for each position. For each item, it applies one of three operations: the predictor creates new items when a nonterminal is expected next, the scanner advances the dot past a terminal that matches the current input word, and the completer advances the dot past a nonterminal when a complete constituent has been found.
Complexity Properties
Earley's algorithm achieves O(n3) worst-case time for general CFGs, O(n2) for unambiguous grammars, and O(n) for grammars that are LR(k). This adaptivity makes it attractive for natural language parsing, where grammars are highly ambiguous but individual sentences may admit only a few parses. Unlike CYK, it does not require conversion to Chomsky Normal Form, which can significantly increase grammar size.
Modern implementations of Earley's algorithm include optimizations such as prediction lookahead, Leo's improvement for right-recursive grammars (achieving O(n) for all LR-regular grammars), and probabilistic extensions. The algorithm's ability to handle arbitrary CFGs without grammar transformation makes it a natural choice for grammar engineering and for parsing with feature-based unification grammars.