Dependency Parsing

Dependency parsing represents the syntactic structure of a sentence as a set of directed binary relations between words. Each relation, called a dependency, connects a head (governor) word to a dependent (modifier) word and is labeled with a grammatical relation such as nsubj (nominal subject), obj (direct object), or amod (adjectival modifier). The resulting structure is a dependency tree: a rooted, directed tree in which every word except the root has exactly one head, and there are no cycles.

Dependency Trees

Dependency Tree Properties A dependency tree for sentence w₁ ... w_n is a directed graph (V, A):
V = {0, 1, ..., n} (0 = artificial root node)
A ⊆ V × V (set of arcs/dependencies)

Constraints:
1. Single head: each w_i (i ≥ 1) has exactly one head
2. Acyclicity: no directed cycles
3. Connectedness: every word is reachable from root
Together ⇒ A forms a directed tree rooted at node 0

Dependency representations have several advantages over constituency representations. They directly encode predicate-argument structure and head-modifier relations, making them closer to the semantic interpretation. They are more suitable for languages with free word order (such as Czech, Turkish, or Hindi), where constituency trees require many discontinuous or crossing branches. They also produce more compact representations — every tree has exactly n arcs for a sentence of n words.

Approaches to Dependency Parsing

The two dominant algorithmic paradigms are transition-based parsing, which constructs the tree incrementally through a sequence of shift-reduce actions, and graph-based parsing, which scores all possible arcs and finds the highest-scoring tree using maximum spanning tree algorithms. Transition-based parsers are faster (typically linear time) but may propagate errors; graph-based parsers are globally optimal but slower (typically quadratic or cubic time).

Dependency vs. Constituency

Dependency and constituency representations are inter-convertible under certain assumptions: given head-finding rules, a constituency tree can be converted to a dependency tree by identifying the head of each phrase and establishing dependencies from each head to its siblings' heads. This conversion is how dependency treebanks are often bootstrapped from constituency treebanks.

Modern dependency parsers achieve labeled attachment scores (LAS) above 95% for English and above 90% for many other languages. The Universal Dependencies project has standardized annotation across more than 100 languages, enabling large-scale multilingual dependency parsing research. Neural network models, particularly those based on BiLSTMs and Transformers, have achieved the highest accuracy across most languages.

Dependency Trees

Approaches to Dependency Parsing

References

External Links

Dependency Trees

Approaches to Dependency Parsing

Related Topics

References

External Links