How to load any Huggingface [Transformer] models and use them?

Ala Falaki, PhD
5 min readMar 20, 2021

Consider this article as part 2 of my previous post, “How to use [HuggingFace’s] Transformers Pre-Trained tokenizers?”, where I attempted to describe what a tokenizer is, why it is important, and how to use pre-trained ones from the Huggingface library. If you are not familiar with the concept, I highly encourage you to give that piece a read. Also, you should have a basic understanding of Artificial Neural Networks.

You can see the list of all models in the Huggingface website. The picture contains the model names.
A words cloud made from the name of the 40+ available transformer-based models available in the Huggingface.

So, Huggingface 🤗

It is a library that focuses on the Transformer-based pre-trained models. The main breakthrough of this architecture was the Attention mechanism which gave the models the ability to pay attention (get it?) to specific parts of a sequence (or tokens). In the time of writing this piece, Transformer is dominating the Natural Language Processing (NLP) field, and there are recent papers that are trying to use it for vision. If you want to get into NLP, it definitely worth it to put in a time and truly understand each component of this architecture.

I will not go through the Transformer details as there are already numerous guides and tutorials available. (Want my recommendation? Ok! Read the Jay Alammar’s “The Illustrated Transformer”) Just know these pre-trained Transformer models (like BERT, GPT-2, BART, …) are very useful whether you want to do a…

--

--

Ala Falaki, PhD

Technical Editor @ Towards AI - Write about NLP here. Let's talk on Twitter! https://nlpiation.github.io/