Earley Parser

Earley's algorithm, introduced by Jay Earley in 1970, is a chart-based parsing algorithm that handles any context-free grammar, including left-recursive, right-recursive, and ambiguous grammars, without requiring conversion to a normal form. It operates by maintaining a chart of partial parse states (called Earley items or dotted rules) organized by position in the input string. Each item records how much of a production rule has been recognized so far.

Earley Items and Operations

Earley Item [A → α • β, i, j]
A → αβ is a production rule
The dot (•) separates recognized (α) from expected (β) symbols
i = start position, j = current position

Three operations:
PREDICTOR: [A → α • B β, i, j] ⇒ add [B → • γ, j, j] for all B → γ
SCANNER: [A → α • a β, i, j] and w_j+1 = a ⇒ add [A → α a • β, i, j+1]
COMPLETER: [B → γ •, k, j] and [A → α • B β, i, k] ⇒ add [A → α B • β, i, j]

The algorithm processes the input left-to-right, maintaining a set of items (a chart column) for each position. For each item, it applies one of three operations: the predictor creates new items when a nonterminal is expected next, the scanner advances the dot past a terminal that matches the current input word, and the completer advances the dot past a nonterminal when a complete constituent has been found.

Complexity Properties

Earley's algorithm achieves O(n³) worst-case time for general CFGs, O(n²) for unambiguous grammars, and O(n) for grammars that are LR(k). This adaptivity makes it attractive for natural language parsing, where grammars are highly ambiguous but individual sentences may admit only a few parses. Unlike CYK, it does not require conversion to Chomsky Normal Form, which can significantly increase grammar size.

Earley vs. CYK

Earley parsing is top-down with bottom-up confirmation, while CYK is purely bottom-up. Earley's advantage is that it avoids building constituents that cannot be part of a complete parse rooted at S, a form of top-down prediction. CYK's advantage is simplicity and easy extension to probabilistic parsing.

Modern implementations of Earley's algorithm include optimizations such as prediction lookahead, Leo's improvement for right-recursive grammars (achieving O(n) for all LR-regular grammars), and probabilistic extensions. The algorithm's ability to handle arbitrary CFGs without grammar transformation makes it a natural choice for grammar engineering and for parsing with feature-based unification grammars.

Earley Items and Operations

Complexity Properties

References

External Links

Earley Items and Operations

Complexity Properties

Related Topics

References

External Links