Neural constituency parsing applies deep learning architectures to the problem of predicting phrase-structure trees. These models replace the hand-crafted features and independence assumptions of statistical parsers with learned distributed representations that capture long-range contextual information. The two main paradigms are chart-based neural parsers, which score spans and use dynamic programming for decoding, and transition-based or sequence-to-sequence neural parsers, which generate trees incrementally.
Chart-Based Neural Parsing
s(i, j, l) = MLP(hj − hi) · rl
hi = contextual representation from BiLSTM or Transformer
rl = learned label embedding
T* = argmaxT ∑(i,j,l) ∈ T s(i, j, l)
Decoded via CYK in O(n³) or O(n²) with greedy top-down splitting
Chart-based neural parsers compute a score for each possible labeled span using neural representations, then find the highest-scoring tree using dynamic programming. Stern et al. (2017) used a BiLSTM encoder with span representations formed by subtracting endpoint vectors. Kitaev and Klein (2018) achieved a major breakthrough by using a self-attention Transformer encoder, reaching 95.1% F1 on the Penn Treebank. Their model uses a factored approach where span scores decompose into a sum over individual labeled spans, enabling efficient CYK-style decoding.
Sequence-to-Sequence Parsing
An alternative approach linearizes parse trees as sequences and uses sequence-to-sequence models to generate them. Vinyals et al. (2015) showed that an attention-based encoder-decoder model could produce reasonable parse trees when trained on linearized Penn Treebank trees. Later work by Choe and Charniak (2016) used neural language models over linearized trees for reranking. While these approaches are conceptually simple, chart-based methods generally achieve higher accuracy because they exploit the structural constraints of valid trees.
Current State of the Art
Modern neural constituency parsers achieve remarkable accuracy: over 96% F1 on the Penn Treebank WSJ test set, compared to ~91% for the best pre-neural statistical parsers. Key factors driving this improvement include contextual word representations from Transformers, self-attention mechanisms that capture long-range dependencies, and pre-trained language models that provide rich linguistic knowledge. These parsers also generalize better to out-of-domain text and to other languages, especially when combined with multilingual pre-trained models.