Natural Language Processing – Extracting Sentiment from the Text Data
Jul 16, 2019 • 7 Minute Read
Introduction
Natural Language Processing (or NLP) is ubiquitous, and has multiple applications across sectors. One of the most common applications is to analyse the sentiment or polarity of textual data - in the form of customer reviews, social media feeds, employee feedback, surveys, etc.
Sentiment analysis is basically the process of determining the attitude or emotion of the text, i.e., whether it is positive, negative or neutral. In this guide, you will learn about how to extract sentiment from the text using the TextBlob library from Python. We will start by importing the libraries to be used in this guide.
Loading the Required Libraries and Modules
# Adding needed libraries and reading data
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
# for text
import nltk
nltk.download('stopwords')
Text Blob
TextBlob is a python library that offers a simple API to access its methods to perform various NLP tasks. The lines of code below will install the TextBlob library and download the necessary NLTK corpora.
# $ pip install -U textblob
# $ python -m textblob.download_corpora
from textblob import TextBlob, Word, Blobber
Let us examine how the TextBlob library works with an example. The first line of code below contains the text example, while the second line prints the text. The third line uses the sentiment function and returns two properties - polarity and subjectivity. Let’s check the sentiment of our example.
text = TextBlob("Pluralsight is a great place for learning amazing technology courses")
print (text)
text.sentiment
Output:
Pluralsight is a great place for learning amazing technology courses
Sentiment(polarity=0.7000000000000001, subjectivity=0.825)
The above output shows that the polarity of the sentence is 0.7, indicating that the sentiment is positive. Polarity is of 'float' type and lies in the range of [-1,1], where 1 means a high positive sentiment, and -1 means a high negative sentiment.
The output also prints subjectivity of the text which is 0.825 in our example. Subjectivity is also of 'float' type and lies in the range of [0,1]. The value closer to 1 indicates that the sentence is mostly a public opinion and not a factual piece of information and vice versa. We now have an understanding of how the TextBlob library works. Let us now run this exercise on a dataset.
Problem Statement
In this guide, we will take up the task of understanding the sentiment of tweets about the company Apple. The dataset contains 1181 observations and 2 variables, as described below:
- Tweet: Consists of the twitter comments by the users. The twitter data is publicly available.
- Avg: Average sentiment of the tweet (-2 means most negative while +2 means most positive). This classification was done using the Amazon Mechanical Turk. However, for the purpose of this guide, we will not use this variable.
Loading the Data and Performing Basic Data Checks
The first line of code below reads in the data as pandas dataframe, while the second line prints the shape - 1,181 observations of 2 variables. The third line prints the first five observations.
dat = pd.read_csv('tweetsdata.csv')
print(dat.shape)
dat.head()
Output:
(1181, 2)
| | Tweet | Avg |
|--- |--------------------------------------------------- |------ |
| 0 | iphone 5c is ugly as heck what the freak @appl... | -2.0 |
| 1 | freak YOU @APPLE | -2.0 |
| 2 | freak you @apple | -2.0 |
| 3 | @APPLE YOU RUINED MY LIFE | -2.0 |
| 4 | @apple I hate apple!!!!! | -2.0 |
The objective is to detect the sentiment of the tweet. So, we will begin by checking the sentiment of the first five tweets which is done in the line of code below.
dat['Tweet'][:10].apply(lambda x: TextBlob(x).sentiment)
Output:
0 (-0.7, 1.0)
1 (0.0, 0.0)
2 (0.0, 0.0)
3 (0.0, 0.0)
4 (-1.0, 0.9)
5 (-1.0, 1.0)
6 (-0.13333333333333333, 0.16666666666666666)
7 (-0.13221153846153846, 0.3846153846153846)
8 (0.0, 0.0)
9 (-0.47500000000000003, 0.7000000000000001)
Name: Tweet, dtype: object
The output above is a tuple representing polarity and subjectivity of each tweet. Since we are interested in the sentiment, we will only extract the polarity and apply it to all the observations. The first line of code below extracts the polarity for all the observations, and stores it in a new variable 'sentiment'. The second line prints the first five observations.
dat['sentiment'] = dat['Tweet'].apply(lambda x: TextBlob(x).sentiment[0])
dat.head()
Output:
| | Tweet | Avg | sentiment |
|--- |--------------------------------------------------- |------ |----------- |
| 0 | iphone 5c is ugly as heck what the freak @appl... | -2.0 | -0.7 |
| 1 | freak YOU @APPLE | -2.0 | 0.0 |
| 2 | freak you @apple | -2.0 | 0.0 |
| 3 | @APPLE YOU RUINED MY LIFE | -2.0 | 0.0 |
| 4 | @apple I hate apple!!!!! | -2.0 | -1.0 |
The above output shows that each observation now has a sentiment polarity score, with 1 representing a positive sentiment and -1 representing a negative statement. The first and the fifth records are clearly negative while the remaining three have a polarity value of zero.
Conclusion
In this guide, you have learned about how to extract sentiments from the text data using a powerful python library, 'TextBlob'. To learn more about Natural Language Processing with Python, please refer to the following guides: