Transition-based dependency parsing constructs a dependency tree by applying a sequence of actions (transitions) to a configuration consisting of a stack, an input buffer, and a set of arcs built so far. At each step, a classifier selects the next action based on features of the current configuration. Because each action takes constant time and the number of actions is linear in the sentence length, transition-based parsers run in O(n) time, making them among the fastest parsing algorithms.
Arc-Standard System
σ = stack, β = buffer, A = set of dependency arcs
SHIFT: (σ, wi|β, A) ⇒ (σ|wi, β, A)
LEFT-ARCl: (σ|wi|wj, β, A) ⇒ (σ|wj, β, A ∪ {(wj, l, wi)})
RIGHT-ARCl: (σ|wi|wj, β, A) ⇒ (σ|wi, β, A ∪ {(wi, l, wj)})
Initial: ([root], w1...wn, ∅)
Terminal: ([root], ∅, A)
The arc-standard system, introduced by Nivre (2004), uses three transitions. SHIFT moves the next word from the buffer onto the stack. LEFT-ARC creates a dependency from the top of the stack to the second element and removes the second element. RIGHT-ARC creates a dependency from the second element to the top and removes the top. The arc-eager variant allows arcs to be created earlier, as soon as a head-dependent pair is identified, which can improve accuracy for certain constructions.
Training and Classification
The transition classifier is trained on oracle transition sequences derived from gold-standard dependency trees. Classical approaches used SVMs or perceptrons over hand-crafted features (word forms, POS tags, and already-built arcs on the stack). Chen and Manning (2014) introduced neural transition-based parsing, using a feed-forward network with dense embeddings, which achieved comparable accuracy with much simpler feature engineering. Later work by Kiperwasser and Goldberg (2016) and Dozat and Manning (2017) used BiLSTM representations for further gains.
Transition-based parsers dominate practical applications where speed is critical, such as real-time text processing pipelines. The spaCy NLP library, for example, uses a transition-based parser as its default. Despite their greedy nature, modern neural transition-based parsers with beam search or dynamic oracles achieve accuracy comparable to graph-based alternatives, while maintaining substantially faster parsing speed.