What are the differences in Pre-Trained Transformer-base models like BERT, DistilBERT, XLNet, GPT, XLNet, …

Ala Alam Falaki
9 min readMay 19, 2021

This article is a cheat sheet of well-known Transformer-based models and tries to explain their uniqueness (while they are all based on the same architecture).

The image is asking “What makes each Transformer-base model unique?”

The combination of Transformer architecture and transfer learning is dominating the Natural Language Processing world. There…

--

--

Ala Alam Falaki

Technical Editor @ Towards AI - Ph.D. candidate working on Text Summarization / write about NLP. Let's talk on Twitter! https://nlpiation.github.io/