Emotion detection in text aims to identify the specific emotional states expressed by authors, moving beyond the binary or ternary polarity judgments of sentiment analysis to a richer categorical or dimensional characterisation of affect. While sentiment analysis distinguishes positive from negative, emotion detection differentiates joy from anticipation, anger from disgust, and sadness from fear. This finer granularity is essential for applications such as mental health monitoring, empathetic dialogue systems, crisis response, and computational social science research on emotional dynamics in public discourse.
Emotion Models and Taxonomies
Categorical (Plutchik): 8 primary emotions in opposing pairs
Dimensional (Russell): valence × arousal continuous space
v ∈ [-1, +1] (unpleasant to pleasant)
a ∈ [-1, +1] (calm to excited)
Two major theoretical frameworks underlie computational emotion detection. Categorical models, following Ekman (1992), define a set of basic emotions — typically anger, disgust, fear, joy, sadness, and surprise — that are considered universal across cultures. Plutchik's wheel extends this to eight primary emotions arranged in opposing pairs. Dimensional models, following Russell's circumplex model of affect (1980), represent emotions as points in a continuous space defined by valence (pleasant to unpleasant) and arousal (calm to excited), with some models adding a third dominance dimension. The choice of framework affects annotation, modelling, and evaluation: categorical models are more interpretable but less expressive, while dimensional models capture subtlety but are harder to annotate reliably.
Computational Approaches
Lexicon-based approaches to emotion detection use resources such as the NRC Emotion Lexicon (Mohammad and Turney, 2013), which associates over 14,000 words with eight basic emotions and two sentiment polarities. The EmoLex was constructed through crowdsourcing, with annotators judging whether each word evokes a particular emotion. Document-level emotion is estimated by aggregating the emotion associations of constituent words, similar to lexicon-based sentiment analysis. While simple and interpretable, lexicon-based methods struggle with context-dependent emotion expression, figurative language, and implicit emotion.
Much emotional expression in text is implicit rather than explicit. The sentence "I just got laid off from my job" does not contain any explicitly emotional words, yet clearly conveys sadness, anxiety, and possibly anger. Detecting implicit emotion requires world knowledge and pragmatic reasoning — understanding that job loss is generally a negative life event. The ISEAR (International Survey on Emotion Antecedents and Reactions) dataset captures such implicit emotion by collecting descriptions of situations that elicit specific emotions, providing training data for models that must infer emotion from situational context.
Supervised machine learning and deep learning approaches train classifiers on labelled emotion datasets such as the SemEval-2018 emotion detection task, GoEmotions (Demszky et al., 2020), or the Unified Emotion Dataset. Transformer-based models fine-tuned on emotion-annotated data achieve the best performance, leveraging contextual representations to disambiguate emotion-bearing expressions. Multi-task learning approaches that jointly predict emotion categories and dimensional values have shown improvements over single-task models, suggesting that the two representation frameworks provide complementary information. A persistent challenge is the subjectivity and ambiguity of emotion annotation: inter-annotator agreement is typically lower for emotion detection than for sentiment analysis, reflecting the genuine difficulty of inferring emotional states from text.