Teaching Apps to Read with Azure Cognitive Services
This guide explains three features that the Text Analytics service can help you with: sentiment analysis, named entity recognition, and language detection.
Nov 3, 2020 • 8 Minute Read
Introduction
As users of software applications, we can interact with technology in many ways. We can now talk to a computer and have it understand what we are telling it. We can use computer vision to have a machine recognize facial expressions. But by and large, most of the time we still communicate to computers with plain old text. This is proven by the incredible amount of text data that exists. How can we extract meaningful and actionable insights from that text data?
Azure Cognitive Services includes the Text Analytics service. This gives developers an easy to use a REST API with client libraries in several mainstream languages, including C#, JavaScript, and Python. The Text Analytics service has several features that are commonly found when working with text in software applications today. In this guide, we will discuss three: sentiment analysis, named entity recognition, and language detection. When using Azure Cognitive Services, you as a developer don't have to know anything about machine learning or natural language processing. If you can call a REST API or use a language and client library, you can integrate text analytics with your next project! This guide will demonstrate how to use the Text Analytics service with the C# client library.
Setup
The Azure Text Analytics cognitive service has a common setup regardless of the features you need. In the Azure portal, search for the Text Analytics service and create a new resource. To experiment, in the pricing tier, select Free and avoid incurring charges. When the resource is provisioned, click the Keys and Endpoints link on the left. This will show you the API keys and endpoint needed to access the service. Treat the API keys like passwords!
First, you'll need to add the Azure.AI.TextAnalytics NuGet package to the project dependencies. The latest version as of the publication of this guide is 5.0.0.
Add variables for your API key and endpoint.
static string API_KEY = "your-api-key";
static string ENDPOINT = "your-endpoint"
The API key is used to create an instance of AzureKeyCredential, which is used to authenticate the account owner. The endpoint is used to create an instance of Uri and is the base location.
var azureCredentials = new AzureKeyCredentials(API_KEY);
var endpoint = new Uri(ENDPOINT);
All features of Text Analytics will use methods on the TextAnalyticsClient class, so I'll create an instance:
var client = new TextAnalyticsClient(endpoint, azureCredentials);
Sentiment Analysis
The first feature of the Text Analytics service this guide will discuss is sentiment analysis. The name is self-explanatory. The service analyzes a piece of text and returns a prediction of whether the sentiment is positive or negative. To analyze the sentiment of a piece of text, call the AnalyzeSentiment method.
DocumentSentiment sentimentAnalysisResults = client.AnalyzeSentiment("Azure Cognitive Service is fantastic when you need to add AI to an application quickly");
Notice that where I have been using type inference to declare instances of the client and other classes, here I explicitly used DocumentSentiment. This is because the actual return type of AnalyzeSentiment is Response<DocumentSentiment>. The DocumentSentiment will have a SentenceSentiment for each sentence in the analyzed text. These are stored in the Sentences property. The SentenceSentiment has three properties of interest.
- Text- the sentence itself
- Sentiment - the prediction of 'Positive' or 'Negative'
- ConfidenceScores - the values of each sentiment
The ConfidenceScores property has three values, each between 0.0 and 1.0 inclusive for each sentiment:
- Positive
- Negative
- Neutral
The prediction for this text is positive Sentiment with a score of 1.0 for Positive and 0.0 for Negative and Neutral.
If I changed the text to "It's not so great if you have specialized needs," the prediction for the Sentiment is negative and the score for Negative is 1.0.
Named Entity Recognition
The Text Analytics service can also parse fourteen different entities out of text. This includes the names of people, geographic locations, email addresses, and phone numbers in the United States and European Union. This is called named entity recognition (NER). Using NER is as simple as using the sentiment analysis. You just call a different method on the TextAnalyticsClient instance. Simply provide the RecognizeEntities method with the text to analyze.
var recognizeEntitiesResult = client.RecognizeEntities("Microsoft Azure is used all over the world from the Australia to Zimbabwe.")
The return value of RecognizeEntities has a Value property that is a collection of CategorizedEntity for each detected entity in the text. The CategorizedEntity has three properties of interest:
- Text - the entity itself
- Category - the predicted category of the entity from the list of fourteen
- ConfidenceScore - a value between 0.0 and 1.0 inclusive, with 1.0 being the most certain in the predicted category
If you look at the list of categories in the Cognitive Services docs, some of the categories have subcategories. There is also a Subcategory property as well.
If the Text Analytics service is asked to find entities in the string "Microsoft Azure is used all over the world, from Australia to Zimbabwe," it will find three entities: "Microsoft Azure", "Australia", and "Zimbabwe". It recognizes "Australia" and "Zimbabwe" as geographic locations with high certainty, 0.91 and 0.87 respectively. However, it predicts that "Microsoft Azure" is an "Organization." This seems odd as Azure is a software product. And Azure isn't that certain about the label, either, with just a 0.51 score. If I modify the text to "Microsoft sells software all over the world from Australia to Zimbabwe," it predicts "Microsoft" is an organization with a score of 0.79. The scores for Australia and Zimbabwe are still quite high. And it also predicts "software" to be a skill with a score of 0.8.
Language Detection
While this guide is written in English, it is only one of many languages that the Text Analytics service recognizes. Calling the DetectLanguage method on a TextAnalyticsClient instance will return a value with a predicted language. I've used Microsoft Translator to translate the string "Microsoft sells software all over the world" into Spanish, Russian, and Japanese.
var spanish = "Microsoft vende software en todo el mundo.";
var russian = "Microsoft продает программное обеспечение по всему миру.";
var japanese = "マイクロソフトは世界中でソフトウェアを販売しています。";
As with the sentiment analysis, I must explicitly declare the type of the return value.
DetectedLanguage detectSpanish = client.DetectLanguage(spanish);
DetectedLanguage detectRussian = client.DetectLanguage(russian);
DetectedLanguage detectJapanese = client.DetectLanguage(japanese);
The DetectedLanguage has a Name property, which is the predicted language for the text.
If you check the results of analyzing the three translated sentences, you will find that Azure correctly predicts them to be Spanish, Russian, and Japanese. So Azure can work with languages using the Western alphabet, the Cyrillic alphabet used in Russia, and languages like Japanese in which the differences between characters are subtler.
Conclusion
Applications can use Azure Cognitive Services to add sentiment analysis, entity recognition, and language detection. The Text Analytics service requires no knowledge of machine learning and costs a fraction of a penny per transaction. Using the REST API or client libraries, developers can integrate the service into web and mobile applications in almost any language. The hard part is that natural language processing is handled by Microsoft. It's cheaper and more reliable than trying to train machine learning models yourself. These problems have been solved before, so you get to focus on what makes your applications great! Thanks for reading!