Projective vs. Nonprojective

A dependency tree is projective if, for every arc from head h to dependent d, all words between h and d in the linear order are dominated by h (i.e., are descendants of h in the tree). Equivalently, a tree is projective if no two arcs cross when drawn above the sentence. Non-projective trees contain at least one crossing arc pair. The distinction is important because projectivity constrains which parsing algorithms are applicable and which linguistic phenomena can be captured.

Formal Definition

Projectivity Arc (h, d) is projective iff for all k with min(h,d) < k < max(h,d):
h →* k (k is a descendant of h)

A tree is projective iff all its arcs are projective.

Equivalently: no two arcs (h₁, d₁) and (h₂, d₂) cross,
where arcs cross iff h₁ < h₂ < d₁ < d₂ or h₂ < h₁ < d₂ < d₁ (or symmetric cases)

English is largely projective: the Penn Treebank-derived dependencies contain fewer than 2% non-projective arcs. However, languages with freer word order show much higher rates: Czech (~23%), Dutch (~36%), and Ancient Greek (~40%) have substantial non-projectivity. Common sources of non-projectivity include topicalization, wh-movement, extraposition, and scrambling.

Parsing Implications

Projective dependency parsing can be solved in O(n³) time using Eisner's algorithm, which is essentially a variant of CYK for dependencies. Standard transition-based parsers with arc-standard or arc-eager transition systems can only produce projective trees. Non-projective parsing requires different algorithms: the Chu-Liu/Edmonds maximum spanning arborescence algorithm runs in O(n²) for arc-factored models, while pseudo-projective parsing (Nivre and Nilsson, 2005) transforms non-projective trees into projective ones with augmented labels, parses projectively, and then restores the non-projective arcs.

Mildly Non-Projective

Most non-projective structures in natural language are only mildly non-projective: they have a small gap degree (number of gaps in a dependent's projection) and limited edge degree. This has motivated algorithms that handle bounded non-projectivity, such as parsing with gap-degree constraints, which are more efficient than fully non-projective parsing.

Cross-Linguistic Patterns

Typological studies using Universal Dependencies have revealed systematic patterns in non-projectivity across languages. Head-final languages (like Japanese and Korean) tend to be more projective, while languages allowing scrambling and topicalization (like Czech and German) show higher non-projectivity rates. Understanding these patterns is important for selecting appropriate parsing algorithms and for designing annotation schemes that handle discontinuous structures.

Formal Definition

Parsing Implications

Cross-Linguistic Patterns

References

External Links

Formal Definition

Parsing Implications

Cross-Linguistic Patterns

Related Topics

References

External Links