Yoshua Bengio is a Canadian computer scientist at the Universite de Montreal and founder of Mila (the Quebec AI Institute). Together with Geoffrey Hinton and Yann LeCun, he is recognised as one of the three "godfathers of deep learning," sharing the 2018 ACM A.M. Turing Award. His contributions to neural language models, representation learning, and sequence modelling have been transformative for computational linguistics.
Early Life and Education
Born in Paris, France, in 1964, Bengio grew up in Montreal, Canada. He earned his PhD in computer science from McGill University in 1991 and joined the Universite de Montreal, where he built a world-leading deep learning research group that became the Mila institute.
Born in Paris, France
Completed PhD at McGill University
Published "A Neural Probabilistic Language Model"
Co-introduced sequence-to-sequence learning and neural attention
Co-developed generative adversarial networks (GANs) with Ian Goodfellow
Received the ACM Turing Award with Hinton and LeCun
Key Contributions
Bengio's 2003 paper "A Neural Probabilistic Language Model" was a watershed moment for computational linguistics. It introduced the idea of learning distributed word representations (embeddings) as part of a neural network that predicts the next word, demonstrating that neural language models could outperform traditional n-gram models by capturing long-range dependencies and sharing statistical strength across similar words through their learned representations.
His group's work on sequence-to-sequence learning with attention (Bahdanau, Cho, and Bengio, 2014) showed that an encoder-decoder neural network with an attention mechanism could learn to translate between languages, achieving results competitive with phrase-based statistical MT. This attention mechanism — allowing the decoder to focus on relevant parts of the input at each step — became the building block of the Transformer architecture. Bengio also co-developed GRU (Gated Recurrent Unit) networks and contributed fundamental work on the vanishing gradient problem in deep networks.
"Learning representations of data is key to making progress in AI, and natural language is one of the most challenging domains for representation learning." — Yoshua Bengio
Legacy
Bengio's neural language model launched the neural NLP revolution. The attention mechanism developed in his group became the foundation of the Transformer and thus of BERT, GPT, and all subsequent large language models. His advocacy for responsible AI development and his creation of Mila have shaped both the technical and ethical trajectory of the field.