Computational Linguistics
About

Text Summarization

Text summarisation automatically produces a concise and fluent summary that captures the most important information in a source document or collection, addressing the fundamental challenge of information overload by reducing text length while preserving essential content.

S* = argmax_S quality(S) subject to length(S) ≤ k

Text summarisation is the task of automatically generating a shorter version of a text that retains the most important information. Summarisation can be single-document (condensing one document) or multi-document (synthesising information from multiple sources on the same topic). It can also be generic (capturing the overall gist) or query-focused (emphasising information relevant to a specific question). The task requires identifying salient content, removing redundancy, and producing coherent output — a combination of skills that touches nearly every aspect of natural language understanding and generation.

Extractive versus Abstractive Summarisation

Summarisation Paradigms Extractive: S = {sᵢ ∈ D : select(sᵢ) = 1}
Select a subset of source sentences

Abstractive: S = generate(D)
Generate new text that may paraphrase or fuse source content

Evaluation: ROUGE-N = (∑ countₘₐₜ꜀ₕ(n-gram)) / (∑ count(n-gram))
measuring n-gram overlap between system and reference summaries

Summarisation approaches fall into two broad categories. Extractive summarisation selects sentences (or smaller units) from the source text and concatenates them to form the summary. Abstractive summarisation generates new text that may paraphrase, compress, or fuse information from the source, producing summaries that read more naturally but are harder to generate faithfully. In practice, many modern systems are hybrid, using extractive methods to select salient content and abstractive methods to rephrase it. The distinction parallels the difference between highlighting passages in a textbook versus writing a summary in one's own words.

Evaluation Challenges

Evaluating summarisation quality is notoriously difficult because there is no single correct summary for a given document. The ROUGE metrics (Lin, 2004) measure n-gram overlap between system-generated and human-written reference summaries, with ROUGE-1 (unigrams), ROUGE-2 (bigrams), and ROUGE-L (longest common subsequence) being the most widely reported. While ROUGE correlates with human judgments at the system level, it has well-known limitations: it rewards lexical overlap without considering semantic equivalence, fluency, or factual correctness. BERTScore and other embedding-based metrics offer more semantically-aware evaluation but remain imperfect proxies for human judgment.

The DUC and TAC Evaluations

The Document Understanding Conference (DUC, 2001–2007) and its successor, the Text Analysis Conference (TAC, 2008–2014), established the foundational evaluation framework for summarisation research. These shared tasks provided standard datasets, evaluation protocols, and human evaluation studies that enabled systematic comparison of summarisation systems. DUC/TAC evaluations revealed that extractive methods could achieve reasonable quality but consistently lagged behind human summaries, motivating the development of abstractive approaches that could bridge this gap.

A critical challenge for abstractive summarisation is faithfulness: generated summaries may contain information not present in the source document (hallucination) or contradict the source (factual inconsistency). Kryscinski et al. (2020) found that approximately 30% of summaries generated by state-of-the-art models contain factual errors. Addressing faithfulness requires methods for verifying generated content against source documents, constrained decoding that prevents hallucination, and evaluation metrics specifically designed to measure factual consistency. This challenge highlights the tension between fluency and faithfulness in natural language generation.

Related Topics

References

  1. Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out: Proceedings of the ACL Workshop, 74–81.
  2. Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. Mining Text Data, 43–76. Springer.
  3. Kryscinski, W., McCann, B., Xiong, C., & Socher, R. (2020). Evaluating the factual consistency of abstractive text summarization. Proceedings of EMNLP, 9332–9346.
  4. Mani, I. (2001). Automatic Summarization. John Benjamins Publishing.

External Links