HuggingFace
pipeline() function from the transformers library can be used to run inference with models from the Hugging Face Hub.Deep learning is currently undergoing a period of rapid progress across a wide variety of domains, including:
📖 Natural language processing
👀 Computer vision
🔊 Audio
and many more!
The main driver of these breakthroughs is the Transformer – a novel neural network developed by Google researchers in 2017.
💻 They can generate code as in products like GitHub Copilot, which is based on OpenAI’s family of GPT models.
❓ They can be used for improve search engines, like Google did with a Transformer called BERT.
🗣️ They can process speech in multiple languages to perform speech recognition, speech translation, and language identification. For example, Facebook’s XLS-R model can automatically transcribe audio in one language to another!
Training Transformer models from scratch involves a lot of resources (compute, data, and days to train)
With transfer learning, it is possible to adapt a model that has been trained from scratch (usually called a pretrained model) for a new, but similar task.
Fine-tuning can be used as a special case of transfer learning where you use new data to continue training the model on the new task.
The models that we’ll be looking at in this tutorial are all examples of fine-tuned models
You can learn more about the transfer learning process in the video below:
The Hugging Face Transformers library provides a unified API across dozens of Transformer architectures, as well as the means to train models and run inference with them.
The fastest way to learn what Transformers can do is via the pipeline() function.
This function loads a model from the Hugging Face Hub and takes care of all the preprocessing and postprocessing steps that are needed to convert inputs into predictions:

Import the pipeline:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""Let’s start with one of the most common tasks in NLP: text classification
Now suppose that we’d like to predict the sentiment of this text, i.e. whether the feedback is positive or negative.
This is a special type of text classification that is often used in industry to aggregate customer feedback across products or services.

pipeline() function as follows;When you run this code, you’ll see a message about which Hub model is being used by default.
In this case, the pipeline() function loads the distilbert-base-uncased-finetuned-sst-2-english model, which is a small BERT variant trained on SST-2 which is a sentiment analysis dataset.
Note
💡 The first time you execute the code, the model will be automatically downloaded from the Hub and cached for later use!
Output: [{‘label’: ‘NEGATIVE’, ‘score’: 0.9015464186668396}]
The model predicts negative sentiment with a high confidence which makes sense given that we have a disgruntled customer.
You can also see that the pipeline returns a list of Python dictionaries with the predictions.
We can also pass several texts at the same time in which case we would get several dicts in the list for each text one.
Instead of just finding the overall sentiment, let’s see if we can extract entities such as organizations, locations, or individuals from the text.
This task is called named entity recognition, or NER for short.

We just load a pipeline for NER without specifying a model.
This will load a default BERT model that has been trained on the CoNLL-2003 dataset:
When we pass our text through the model, we now get a long list of Python dictionaries, where each dictionary corresponds to one detected entity.
Since multiple tokens can correspond to a a single entity, we can apply an aggregation strategy that merges entities if the same class appears in consequtive tokens:
Output: [{‘entity_group’: ‘ORG’, ‘score’: 0.87900954, ‘word’: ‘Amazon’, ‘start’: 5, ‘end’: 11}, {‘entity_group’: ‘MISC’, ‘score’: 0.9908588, ‘word’: ‘Optimus Prime’, ‘start’: 36, ‘end’: 49}, {‘entity_group’: ‘LOC’, ‘score’: 0.9997547, ‘word’: ‘Germany’, ‘start’: 90, ‘end’: 97}, {‘entity_group’: ‘MISC’, ‘score’: 0.55656713, ‘word’: ‘Mega’, ‘start’: 208, ‘end’: 212}, {‘entity_group’: ‘PER’, ‘score’: 0.5902563, ‘word’: ‘##tron’, ‘start’: 212, ‘end’: 216}, {‘entity_group’: ‘ORG’, ‘score’: 0.6696913, ‘word’: ‘Decept’, ‘start’: 253, ‘end’: 259}, {‘entity_group’: ‘MISC’, ‘score’: 0.4983487, ‘word’: ‘##icons’, ‘start’: 259, ‘end’: 264}, {‘entity_group’: ‘MISC’, ‘score’: 0.77536064, ‘word’: ‘Megatron’, ‘start’: 350, ‘end’: 358}, {‘entity_group’: ‘MISC’, ‘score’: 0.987854, ‘word’: ‘Optimus Prime’, ‘start’: 367, ‘end’: 380}, {‘entity_group’: ‘PER’, ‘score’: 0.81209683, ‘word’: ‘Bumblebee’, ‘start’: 502, ‘end’: 511}]It seems that the model found most of the named entities but was confused about “Megatron” andn “Decepticons”, which are characters in the transformers franchise.
This is no surprise since the original dataset probably did not contain many transformer characters. For this reason it makes sense to further fine-tune a model on your on dataset!
In this task, the model is given a question and a context and needs to find the answer to the question within the context.
This problem can be rephrased as a classification problem: For each token the model needs to predict whether it is the start or the end of the answer.

pipeline() function:Generation is much more computationally demanding since we usually generate one token at a time and need to run this several times.
An example for how this process works is shown below:

Output: Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead. As a lifelong enemy of the Decepticons, I hope you can understand my dilemma.But what if there is no model in the language of my data?
You can still try to translate the text.
The Helsinki NLP team has provided over 1,000 language pair models for translation.
Output: Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Eingeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.We can see that the text is clearly not perfectly translated, but the core meaning stays the same.
Another application of translation models is data augmentation via backtranslation
In zero-shot classification the model receives a text and a list of candidate labels and determines which labels are compatible with the text.
Instead of having fixed classes this allows for flexible classification without any labelled data!
Usually this is a good first baseline!
Let’s have a look at an example:
{'sequence': 'Dieses Tutorial ist großartig! Ich hoffe, dass jemand von Hugging Face meine Hochschule besuchen wird :)',
'labels': ['Digital', 'Arbeit', 'Treffen', 'Reisen'],
'scores': [0.7426563501358032,
0.6590237021446228,
0.517701268196106,
0.011237525381147861]}Transformers can also be used for domains other than NLP!
There are many more pipelines that you can experiment with
audio-classification
automatic-speech-recognition
feature-extraction
text-classification
token-classification
question-answering
table-question-answering
visual-question-answering
document-question-answering
fill-mask
summarization
translation
text2text-generation
text-generation
zero-shot-classification
zero-shot-image-classification
zero-shot-audio-classification
conversational
image-classification
image-segmentation
image-to-text
object-detection
zero-shot-object-detection
depth-estimation
video-classification
mask-generation
Another promising area is audio processing (especially Speech2Text)
See for example the wav2vec2 model:

Finally, a lot of real world data is still in form of tables.
Being able to query tables is very useful and with TAPAS you can do tabular question-answering:

Congratulations! You have completed this tutorial 👍
Next, you may want to go back to the lab’s website
The slides are mainly based on a toolkit provided by Hugging Face’s Lewis Tunstall and the book Natural Language Processing with Transformers.
Jan Kirenz