CYK Algorithm

The Cocke-Younger-Kasami (CYK) algorithm, independently discovered by Cocke, Younger, and Kasami in the 1960s, is a tabular bottom-up parsing algorithm for context-free grammars. It requires the grammar to be in Chomsky Normal Form (CNF), where every production is either A → BC (two nonterminals) or A → a (a single terminal). Any CFG can be converted to CNF without changing the generated language, making CYK a general-purpose CFG parser.

Algorithm

CYK Recognition Table Input: w = w₁ w₂ ... w_n, grammar G = (N, T, R, S)

Base case: T[i, i, A] = true iff (A → w_i) ∈ R
Recursive case: T[i, j, A] = true iff ∃ B, C ∈ N, k ∈ [i, j−1]:
T[i, k, B] = true ∧ T[k+1, j, C] = true ∧ (A → BC) ∈ R

Accept iff T[1, n, S] = true

The algorithm fills an n × n upper-triangular table, where cell T[i, j] records all nonterminals that can derive the substring w_i...w_j. Cells are filled in order of increasing span length: first all spans of length 1 (individual words), then length 2, and so on up to the full sentence. For each cell T[i, j], the algorithm considers all possible split points k and all binary rules A → BC such that B derives w_i...w_k and C derives w_k+1...w_j.

Complexity and Extensions

The time complexity is O(n³ |G|), where n is the sentence length and |G| is the grammar size. Space complexity is O(n² |N|). While cubic time is practical for short sentences and small grammars, broad-coverage grammars can have thousands of nonterminals after binarization, making the constant factor significant. Optimizations include agenda-driven parsing, coarse-to-fine pruning, and A* search.

Probabilistic CYK

The probabilistic extension replaces Boolean values with probabilities: T[i, j, A] stores the highest probability of any subtree rooted at A spanning positions i to j. The recursion uses multiplication (of rule and subtree probabilities) and maximization (over split points and rules), yielding the Viterbi parse in the same O(n³) time.

To recover the actual parse tree rather than just recognize the string, each cell stores back-pointers recording which rule and split point were used. These back-pointers are followed from T[1, n, S] to reconstruct the complete derivation. The CYK algorithm remains foundational in computational linguistics and serves as the basis for probabilistic parsing with PCFGs.

Interactive Calculator

Enter CNF grammar rules (one per line, format A -> B C or A -> a) followed by a blank line and a sentence (words separated by spaces). The CYK algorithm builds a parse table and determines whether the sentence is in the language.

CNF Grammar + Sentence (blank line separates)

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Algorithm

Complexity and Extensions

Interactive Calculator

References

External Links

Algorithm

Complexity and Extensions

Interactive Calculator

Related Topics

References

External Links