The Cocke-Younger-Kasami (CYK) algorithm, independently discovered by Cocke, Younger, and Kasami in the 1960s, is a tabular bottom-up parsing algorithm for context-free grammars. It requires the grammar to be in Chomsky Normal Form (CNF), where every production is either A → BC (two nonterminals) or A → a (a single terminal). Any CFG can be converted to CNF without changing the generated language, making CYK a general-purpose CFG parser.
Algorithm
Base case: T[i, i, A] = true iff (A → wi) ∈ R
Recursive case: T[i, j, A] = true iff ∃ B, C ∈ N, k ∈ [i, j−1]:
T[i, k, B] = true ∧ T[k+1, j, C] = true ∧ (A → BC) ∈ R
Accept iff T[1, n, S] = true
The algorithm fills an n × n upper-triangular table, where cell T[i, j] records all nonterminals that can derive the substring wi...wj. Cells are filled in order of increasing span length: first all spans of length 1 (individual words), then length 2, and so on up to the full sentence. For each cell T[i, j], the algorithm considers all possible split points k and all binary rules A → BC such that B derives wi...wk and C derives wk+1...wj.
Complexity and Extensions
The time complexity is O(n3 |G|), where n is the sentence length and |G| is the grammar size. Space complexity is O(n2 |N|). While cubic time is practical for short sentences and small grammars, broad-coverage grammars can have thousands of nonterminals after binarization, making the constant factor significant. Optimizations include agenda-driven parsing, coarse-to-fine pruning, and A* search.
To recover the actual parse tree rather than just recognize the string, each cell stores back-pointers recording which rule and split point were used. These back-pointers are followed from T[1, n, S] to reconstruct the complete derivation. The CYK algorithm remains foundational in computational linguistics and serves as the basis for probabilistic parsing with PCFGs.