Computational Linguistics
About

Graph-Based Parsing

Graph-based dependency parsing formulates tree construction as finding the maximum spanning arborescence over a fully connected directed graph of scored arcs, guaranteeing globally optimal trees.

t* = argmax_{t ∈ T(x)} ∑_{(i,j,l) ∈ t} score(i, j, l)

Graph-based dependency parsing treats the problem as a structured prediction task over directed graphs. For a sentence of n words, a complete directed graph is constructed where each arc (i, j, l) represents a potential dependency from head word i to dependent word j with label l. Each arc is assigned a score by a learned model, and the parser finds the tree that maximizes the total arc score. This global optimization distinguishes graph-based parsers from transition-based parsers, which make local greedy decisions.

Maximum Spanning Tree Algorithms

Arc-Factored Model score(x, t) = ∑(i,j,l) ∈ t s(i, j, l)

t* = argmaxt ∈ T(x) score(x, t)

For projective trees: Eisner's algorithm, O(n³)
For non-projective trees: Chu-Liu/Edmonds' algorithm, O(n²)

For projective dependency trees (where no arcs cross), Eisner's algorithm (1996) finds the optimal tree in O(n³) time using dynamic programming. For non-projective trees (allowing crossing arcs), the Chu-Liu/Edmonds' algorithm finds the maximum spanning arborescence in O(n²) time. The key insight is that under an arc-factored scoring model, where the total tree score decomposes into a sum of individual arc scores, globally optimal parsing is tractable.

Scoring Models

Early graph-based parsers (McDonald et al., 2005) used linear models with hand-crafted features over word forms, POS tags, and arc direction/distance, trained with the structured perceptron or margin-based methods (MIRA). Modern graph-based parsers use neural networks to compute arc scores. The biaffine attention model of Dozat and Manning (2017) uses a BiLSTM to produce contextual word representations, then applies biaffine classifiers to score arcs and labels separately, achieving state-of-the-art accuracy.

Higher-Order Models
Arc-factored models cannot capture interactions between arcs (e.g., that a verb is unlikely to have two subjects). Higher-order models score pairs or triples of arcs jointly, at the cost of increased complexity: second-order models run in O(n³) to O(n&sup4;) depending on the factors considered. In practice, the gains from higher-order models are modest compared to improvements from better neural representations.

Graph-based parsers excel in accuracy, especially for long-distance dependencies and non-projective structures. The biaffine parser of Dozat and Manning has become the de facto standard architecture, used in systems like Stanza and achieving top results in CoNLL shared tasks. The O(n²) complexity for non-projective parsing and O(n³) for projective parsing is fast enough for most applications, though slower than linear-time transition-based parsers.

Related Topics

References

  1. McDonald, R., Pereira, F., Ribarov, K., & Hajiç, J. (2005). Non-projective dependency parsing using spanning tree algorithms. Proceedings of HLT-EMNLP 2005, 523–530. https://doi.org/10.3115/1220575.1220641
  2. Dozat, T., & Manning, C. D. (2017). Deep biaffine attention for neural dependency parsing. Proceedings of ICLR 2017. https://arxiv.org/abs/1611.01734
  3. Eisner, J. M. (1996). Three new probabilistic models for dependency parsing: An exploration. Proceedings of COLING 1996, 340–345. https://doi.org/10.3115/992628.992688
  4. McDonald, R., & Pereira, F. (2006). Online learning of approximate dependency parsing algorithms. Proceedings of EACL 2006, 81–88. https://doi.org/10.3115/1220835.1220846

External Links