A probabilistic context-free grammar (PCFG) extends a standard CFG by assigning a conditional probability to each production rule. The probability of a parse tree is the product of the probabilities of all rules used in the derivation. For an ambiguous sentence, the PCFG selects the parse tree with the highest probability, providing a principled approach to disambiguation. PCFGs also define a probability distribution over strings, making them generative models of language.
Formal Definition
q(A → α) = P(α | A) for each rule A → α ∈ R
Constraint: ∑α q(A → α) = 1 for each A ∈ N
P(tree t) = ∏r ∈ t q(r)
P(sentence w) = ∑t: yield(t)=w P(t)
The probability of a sentence is the sum of the probabilities of all its parse trees. The most likely parse for a sentence w is argmaxt P(t | w) = argmaxt P(t), since all parses generate the same w. The Viterbi parse can be found efficiently using the probabilistic CYK algorithm, and the inside-outside algorithm computes the total string probability and marginal probabilities of constituents.
Parameter Estimation
PCFG rule probabilities are typically estimated from a treebank using maximum likelihood estimation: q(A → α) = count(A → α) / count(A). When treebank data is unavailable, the inside-outside algorithm (a special case of EM) can estimate parameters from unannotated text. However, vanilla PCFGs have significant limitations: they assume context-free independence between rules, which means they cannot capture lexical preferences, structural dependencies, or the tendency of certain constructions to co-occur.
Extensions
Major extensions include lexicalized PCFGs (Collins, 1997; Charniak, 1997) that condition rule probabilities on head words, parent-annotated PCFGs (Johnson, 1998) that refine nonterminals with parent information, and latent-variable PCFGs (Matsuzaki et al., 2005; Petrov et al., 2006) that automatically learn fine-grained nonterminal subcategories. These extensions dramatically improve parsing accuracy, with the Berkeley Parser achieving over 90% F1 using split-merge latent-variable training. PCFGs remain foundational in computational linguistics even as neural approaches have become dominant.