Computational Linguistics
About

Neural Dependency Parsing

Neural dependency parsers use deep learning to produce contextual word representations and score dependency arcs, achieving state-of-the-art accuracy with simpler feature engineering than classical approaches.

s_{arc}(h, d) = h_h^T W h_d + b (biaffine attention scorer)

Neural dependency parsing applies deep neural networks to predict dependency trees, replacing the hand-crafted feature templates of classical statistical parsers with learned distributed representations. The shift began with Chen and Manning (2014), who showed that a simple feed-forward neural network over dense word and tag embeddings could match the accuracy of heavily feature-engineered parsers while being significantly faster. Subsequent work using recurrent and attention-based architectures has pushed accuracy to new heights.

Biaffine Attention Parser

Deep Biaffine Attention (Dozat & Manning, 2017) h1, ..., hn = BiLSTM(w1, ..., wn)

Head representations: harc-headi = MLParc-head(hi)
Dep representations: harc-depj = MLParc-dep(hj)

sarc(i, j) = harc-headiT U harc-depj + W(harc-headi ⊕ harc-depj) + b

Tree decoding: Chu-Liu/Edmonds or Eisner algorithm

The biaffine attention parser of Dozat and Manning (2017) has become the dominant architecture for neural dependency parsing. It uses a multi-layer BiLSTM to produce contextual word representations, then applies separate MLPs to produce head and dependent representations. Arc scores are computed using a biaffine function that captures the interaction between potential head-dependent pairs. Label scores are computed similarly, conditioned on the predicted arc. The highest-scoring tree is found using the Eisner or Chu-Liu/Edmonds algorithm.

Transformer-Based Parsers

More recent work replaces the BiLSTM encoder with pre-trained Transformer models. Using BERT, XLNet, or other large language models as the encoder provides richer contextual representations and better generalization, especially for long-distance dependencies and rare constructions. The biaffine scoring layer remains the same; only the encoder changes. These Transformer-based parsers achieve labeled attachment scores (LAS) above 96% for English and have set new records across most UD treebanks.

Transition vs. Graph-Based Neural Parsing
Both transition-based and graph-based paradigms have benefited from neural representations. Neural transition-based parsers (Kiperwasser & Goldberg, 2016) use BiLSTM features for action classification. Neural graph-based parsers (Dozat & Manning, 2017) use biaffine attention. In practice, graph-based neural parsers tend to achieve slightly higher accuracy, while transition-based parsers are faster, though the gap has narrowed.

Multilingual and Cross-Lingual Parsing

Neural dependency parsers, especially those using multilingual pre-trained models like mBERT and XLM-R, have enabled strong cross-lingual transfer. A parser trained on English UD data with multilingual embeddings can parse other languages with reasonable accuracy even without target-language training data. Joint multilingual training, where a single parser is trained on treebanks from many languages simultaneously, further improves performance for low-resource languages by sharing structural knowledge across typologically similar languages.

Related Topics

References

  1. Dozat, T., & Manning, C. D. (2017). Deep biaffine attention for neural dependency parsing. Proceedings of ICLR 2017. https://arxiv.org/abs/1611.01734
  2. Chen, D., & Manning, C. D. (2014). A fast and accurate dependency parser using neural networks. Proceedings of EMNLP 2014, 740–750. https://doi.org/10.3115/v1/D14-1082
  3. Kiperwasser, E., & Goldberg, Y. (2016). Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the ACL, 4, 313–327. https://doi.org/10.1162/tacl_a_00101
  4. Kondratyuk, D., & Straka, M. (2019). 75 Languages, 1 model: Parsing Universal Dependencies universally. Proceedings of EMNLP-IJCNLP 2019, 2779–2795. https://doi.org/10.18653/v1/D19-1279

External Links