news

Using Hugging Face Transformers for Emotion Detection in Text

Vaseline August 19, 2024

Using Hugging Face Transformers for Emotion Detection in Text

Image by juicy_fish on Freepik

Hugging Face contains a variety of transformer-based language models (LMs) specialized in addressing language comprehension and language generation tasks, including (but not limited to):

Text classification
Recognition of Named Entities (NER)
Text generation
Question-answer
Summary
Translation

A specific -and quite general- case of text classification task is sentiment analysis, where the goal is to identify the sentiment of a given text. The “simplest” type of sentiment analysis LMs are trained to determine the polarity of an input text, such as a customer review of a product, into positive versus negative, or positive versus negative versus neutral. These two specific problems are formulated as binary or multi-class classification tasks, respectively.

There are also LMs that, while still recognizable as sentiment analysis models, are trained to categorize texts into different emotions, such as anger, happiness, sadness, and so on.

This Python-based tutorial focuses on loading and illustrating the use of a Hugging Face pre-programmed model for classifying the main emotion associated with an input text. We use the emotions dataset that is publicly available on the Hugging Face hub. This dataset contains thousands of Twitter posts written in English.

Loading the dataset

We start by loading the training data into the emotion dataset by executing the following instructions:

!pip install datasets
from datasets import load_dataset
all_data = load_dataset("jeffnyman/emotions")
train_data = all_data("train")

Below is a summary of the training subset in the train_data variable contains:

Dataset({
features: ('text', 'label'),
num_rows: 16000
})

The training fold in the emotions dataset contains 16,000 instances that are associated with Twitter messages. For each instance, there are two features: one input feature with the actual message text and one output feature or label with the associated emotion as a numerical identifier:

0: sadness
1: joy
2: love
3: anger
4: fear
5: surprise

Thus, the first labeled case in the training fold is classified with the emotion ‘sadness’:

Output:

{'text': 'i didnt feel humiliated', 'label': 0}

Loading the language model

Once we have loaded the data, the next step is to load a suitable pre-trained LM of Hugging Face for our target emotion detection task. There are two main approaches to loading and using LMs using Hugging Face Transformer Library:

Pipelines provide a very high level of abstraction for preparing to load an LM and performing near-instantaneous inference on the LM with just a few lines of code, at the expense of limited configurability.
Car classes provide a lower level of abstraction, requiring more programming skills, but offering more flexibility to adjust model parameters and customize text preprocessing steps such as tokenization.

This tutorial will give you an easy start, by focusing on loading models as pipelines. Pipelines require you to specify at least the type of language task, and optionally a model name to load. Since emotion detection is a very specific form of text classification problem, the task argument to use when loading the model should be “text-classification”:

from transformers import pipeline
classifier = pipeline("text-classification", model="j-hartmann/emotion-english-distilroberta-base")

On the other hand, it is highly recommended to specify with the ‘model’ argument the name of a specific model in Hugging Face hub that is capable of tackling our specific task of emotion detection. Otherwise, we will default to loading a text classification model that is not trained on data for this specific 6-class classification problem.

You might be wondering, “How do I know what model name to use?”. The answer is simple: do a little research on the Hugging Face website to find suitable models or models trained on a specific dataset, such as the emotion data.

The next step is to start making predictions. Pipelines make this inference process incredibly simple, but by simply calling our newly instantiated pipeline variable and passing an input text to classify as an argument:

example_tweet = "I love hugging face transformers!"
prediction = classifier(example_tweet)
print(prediction)

The result is a predicted label and a confidence score: the closer this score is to 1, the more ‘confident’ the prediction made is.

({'label': 'joy', 'score': 0.9825918674468994})

So our input example “I like to cuddle with face transformers!” confidently conveys a feeling of joy.

You can send multiple input texts to the pipeline to perform multiple predictions at once, as follows:

example_tweets = ("I love hugging face transformers!", "I really like coffee but it's too bitter...")
prediction = classifier(example_tweets)
print(prediction)

The second input in this example seemed much more challenging for the model to perform a reliable classification:

({'label': 'joy', 'score': 0.9825918674468994}, {'label': 'sadness', 'score': 0.38266682624816895})

Finally, we can also pass a batch of instances from a dataset, such as our previously loaded ’emotions’ data. This example passes the first 10 training inputs to our LM pipeline to classify their sentiments, and then it prints out a list of each predicted label, disregarding their confidence scores:

train_batch = train_data(:10)("text")
predictions = classifier(train_batch)
labels = (x('label') for x in predictions)
print(labels)

Output:

('sadness', 'sadness', 'anger', 'joy', 'anger', 'sadness', 'surprise', 'fear', 'joy', 'joy')

For comparison, here are the original labels given to these 10 training instances:

print(train_data(:10)("label"))

Output:

(0, 0, 3, 2, 3, 0, 5, 4, 1, 2)

When we look at the emotions associated with each numerical identification, we see that approximately 7 out of 10 predictions match the true labels given to these 10 cases.

Now that you know how to use Hugging Face transformer models to detect textual emotions, you can explore other use cases and language tasks where pre-trained LMs can help.

Ivan Palomares Carrascosa is a thought leader, author, speaker and advisor in AI, machine learning, deep learning and LLMs. He trains and mentors others in leveraging AI in the real world.