Graph-based dependency parsing treats the problem as a structured prediction task over directed graphs. For a sentence of n words, a complete directed graph is constructed where each arc (i, j, l) represents a potential dependency from head word i to dependent word j with label l. Each arc is assigned a score by a learned model, and the parser finds the tree that maximizes the total arc score. This global optimization distinguishes graph-based parsers from transition-based parsers, which make local greedy decisions.
Maximum Spanning Tree Algorithms
t* = argmaxt ∈ T(x) score(x, t)
For projective trees: Eisner's algorithm, O(n³)
For non-projective trees: Chu-Liu/Edmonds' algorithm, O(n²)
For projective dependency trees (where no arcs cross), Eisner's algorithm (1996) finds the optimal tree in O(n³) time using dynamic programming. For non-projective trees (allowing crossing arcs), the Chu-Liu/Edmonds' algorithm finds the maximum spanning arborescence in O(n²) time. The key insight is that under an arc-factored scoring model, where the total tree score decomposes into a sum of individual arc scores, globally optimal parsing is tractable.
Scoring Models
Early graph-based parsers (McDonald et al., 2005) used linear models with hand-crafted features over word forms, POS tags, and arc direction/distance, trained with the structured perceptron or margin-based methods (MIRA). Modern graph-based parsers use neural networks to compute arc scores. The biaffine attention model of Dozat and Manning (2017) uses a BiLSTM to produce contextual word representations, then applies biaffine classifiers to score arcs and labels separately, achieving state-of-the-art accuracy.
Graph-based parsers excel in accuracy, especially for long-distance dependencies and non-projective structures. The biaffine parser of Dozat and Manning has become the de facto standard architecture, used in systems like Stanza and achieving top results in CoNLL shared tasks. The O(n²) complexity for non-projective parsing and O(n³) for projective parsing is fast enough for most applications, though slower than linear-time transition-based parsers.