Using Text Analytics API to extract keywords, sentiment and more

Spread the love

In this tutorial we are going to see how to use the Text analytics API of the Cognitive Services to help you extract language, keywords, sentiment and from your text. You can call the Text Analytics APIs directly but using the Microsoft.Azure.CognitiveServices.Language SDK is easier.

Prerequisites

  1. To run the sample code you must have an edition of  Visual Studio installed.
  2. You will need the Microsoft.Azure.CognitiveServices.Language SDK NuGet package.
  3. You will need an Azure Cognitive Services key. Follow this tutorial to get one. If you don’t have an Azure account, you can use the free trial to get a subscription key.

Create the Project

To create an application follow the steps below:

  1. Create a .NET Core Console Application in Visual Studio 2017
  2. Add the Microsoft.Azure.CognitiveServices.Language SDK NuGet package by using the NuGet Package Manager Console. If you choose to install it via the GUI make sure you check the Prerelease checkbox.
    Install-Package Microsoft.Azure.CognitiveServices.Language.TextAnalytics -Version 2.8.0-preview
  3. Add the following under Program. First add your subscription key and create a ServiceClientCredetial class like below.
    private const string SubscriptionKey = ""; //Insert your Text Anaytics subscription key
    
            class ApiKeyServiceClientCredentials : ServiceClientCredentials
            {
                public override Task ProcessHttpRequestAsync(HttpRequestMessage request, CancellationToken cancellationToken)
                {
                    request.Headers.Add("Ocp-Apim-Subscription-Key", SubscriptionKey);
                    return base.ProcessHttpRequestAsync(request, cancellationToken);
                }
            }
  4. Then start building your client. Add the following code in Main function to create the client. Replace the location in Endpoint to the endpoint you signed up for. You can find the endpoint on Azure portal resource. The endpoint typically starts with “https://[region].api.cognitive.microsoft.com”, and in here only include protocol and hostname.
    ITextAnalyticsClient client = new TextAnalyticsClient(new ApiKeyServiceClientCredentials())
               {
                   Endpoint = "https://westeurope.api.cognitive.microsoft.com/"
               }; //Replace endpoint with the correct region for your Text Analytics subscription

Detect Language

Continue in Main and add the following code for Language Detection. You can use Batch Input to add Multiple Documents. You can iterate through the results using the result.Documents Collection

var result = client.DetectLanguageAsync(new BatchInput(
                   new List<Input>()
                       {
                         new Input("1", "This is a document written in English."),
                         new Input("2", "Este es un document escrito en Español."),
                         new Input("3", "这是一个用中文写的文件")
                   })).Result;

           // Printing language results.
           foreach (var document in result.Documents)
           {
               Console.WriteLine($"Document ID: {document.Id} , Language: {document.DetectedLanguages[0].Name}");
           }

There are some limits you should be aware of in every request. All of the Text Analytics API endpoints accept raw text data. The current limit is 5,120 characters for each document; if you need to analyze larger documents, you can break them up. The rate limit is 100 calls per minute but you can submit a large quantity of documents in a single call (up to 1000 documents).

Limit Value
Maximum size of a single document 5120 characters
Maximum size of entire request 1MB
Maximum number of documents in a request 1000 Documents

Detect Key-phrases

Το detect Key Phrases  add the following code. You can iterate through the results using the result2.Documents Collection.

KeyPhraseBatchResult result2 = client.KeyPhrasesAsync(new MultiLanguageBatchInput(
            new List<MultiLanguageInput>()
            {
              new MultiLanguageInput("ja", "1", "猫は幸せ"),
              new MultiLanguageInput("de", "2", "Fahrt nach Stuttgart und dann zum Hotel zu Fu."),
              new MultiLanguageInput("en", "3", "My cat is stiff as a rock."),
              new MultiLanguageInput("es", "4", "A mi me encanta el fútbol!")
            })).Result;

// Printing keyphrases
foreach (var document in result2.Documents)
{
    Console.WriteLine($"Document ID: {document.Id} ");

    Console.WriteLine("\t Key phrases:");

    foreach (string keyphrase in document.KeyPhrases)
    {
        Console.WriteLine($"\t\t{keyphrase}");
    }
}

Extract Sentiment

Το detect Sentiment add the following code. You can iterate through the results using the result3.Documents Collection. The score shows the Sentiment. The higher it is the more positive the sentence. Score returns a value from 0 to 1;

SentimentBatchResult result3 = client.SentimentAsync(
                    new MultiLanguageBatchInput(
                        new List<MultiLanguageInput>()
                        {
                          new MultiLanguageInput("en", "0", "I had the best day of my life."),
                          new MultiLanguageInput("en", "1", "This was a waste of my time. The speaker put me to sleep."),
                          new MultiLanguageInput("es", "2", "No tengo dinero ni nada que dar..."),
                          new MultiLanguageInput("it", "3", "L'hotel veneziano era meraviglioso. È un bellissimo pezzo di architettura."),
                        })).Result;


            // Printing sentiment results
            foreach (var document in result3.Documents)
            {
                Console.WriteLine($"Document ID: {document.Id} , Sentiment Score: {document.Score:0.00}");
            }

Identify Entities

Το find Entities add the following code. You can iterate through the results using the result4.Documents Collection.

EntitiesBatchResultV2dot1 result4 = client.EntitiesAsync(
                  new MultiLanguageBatchInput(
                      new List<MultiLanguageInput>()
                      {
                        new MultiLanguageInput("en", "0", "The Great Depression began in 1929. By 1933, the GDP in America fell by 25%.")
                      })).Result;

          // Printing entities results
          foreach (var document in result4.Documents)
          {
              Console.WriteLine($"Document ID: {document.Id} ");

              Console.WriteLine("\t Entities:");

              foreach (EntityRecordV2dot1 entity in document.Entities)
              {
                  Console.WriteLine($"\t\t{entity.Name}\t\t{entity.WikipediaUrl}\t\t{entity.Type}\t\t{entity.SubType}");
              }
          }

You can find the complete source code in my Github in this repository.

Leave a Reply

Your email address will not be published. Required fields are marked *