Graph-Based Parsing

Graph-based dependency parsing treats the problem as a structured prediction task over directed graphs. For a sentence of n words, a complete directed graph is constructed where each arc (i, j, l) represents a potential dependency from head word i to dependent word j with label l. Each arc is assigned a score by a learned model, and the parser finds the tree that maximizes the total arc score. This global optimization distinguishes graph-based parsers from transition-based parsers, which make local greedy decisions.

Maximum Spanning Tree Algorithms

Arc-Factored Model score(x, t) = ∑_{(i,j,l) ∈ t} s(i, j, l)

t* = argmax_{t ∈ T(x)} score(x, t)

For projective trees: Eisner's algorithm, O(n³)
For non-projective trees: Chu-Liu/Edmonds' algorithm, O(n²)

For projective dependency trees (where no arcs cross), Eisner's algorithm (1996) finds the optimal tree in O(n³) time using dynamic programming. For non-projective trees (allowing crossing arcs), the Chu-Liu/Edmonds' algorithm finds the maximum spanning arborescence in O(n²) time. The key insight is that under an arc-factored scoring model, where the total tree score decomposes into a sum of individual arc scores, globally optimal parsing is tractable.

Scoring Models

Early graph-based parsers (McDonald et al., 2005) used linear models with hand-crafted features over word forms, POS tags, and arc direction/distance, trained with the structured perceptron or margin-based methods (MIRA). Modern graph-based parsers use neural networks to compute arc scores. The biaffine attention model of Dozat and Manning (2017) uses a BiLSTM to produce contextual word representations, then applies biaffine classifiers to score arcs and labels separately, achieving state-of-the-art accuracy.

Higher-Order Models

Arc-factored models cannot capture interactions between arcs (e.g., that a verb is unlikely to have two subjects). Higher-order models score pairs or triples of arcs jointly, at the cost of increased complexity: second-order models run in O(n³) to O(n&sup4;) depending on the factors considered. In practice, the gains from higher-order models are modest compared to improvements from better neural representations.

Graph-based parsers excel in accuracy, especially for long-distance dependencies and non-projective structures. The biaffine parser of Dozat and Manning has become the de facto standard architecture, used in systems like Stanza and achieving top results in CoNLL shared tasks. The O(n²) complexity for non-projective parsing and O(n³) for projective parsing is fast enough for most applications, though slower than linear-time transition-based parsers.

Maximum Spanning Tree Algorithms

Scoring Models

References

External Links

Maximum Spanning Tree Algorithms

Scoring Models

Related Topics

References

External Links