Have you ever marveled at Google’s ability to find the right answers to even poorly formulated search queries? Or maybe you’ve been astonished by the ever-increasing accuracy of its machine translation service? 

In this article, we’d like to cover some of the innovations that Google has contributed to the field of natural language processing (NLP). We’ll introduce you to its cloud-based NLP services and show you how to use them in your own projects.

Why NLP?

The term natural language processing describes a discipline at the interface between machine learning and computational linguistics. Its primary goal is to make human language understandable to computers. This is a very difficult problem, but one that Google is heavily invested in solving. 

Nowadays, the search engine can deal with many tricky aspects of written language. It recognizes if your query concerns a famous person and offers you a list with basic information about that individual. It can distinguish between homonyms, and even respects the word order in your queries: “dog bites man” returns very different results than “man bites dog.”

Google’s Research in NLP

Search for the most relevant NLP papers on arxiv, and you’ll find that many were authored at big tech companies, like LinkedIn, Facebook, or Google. Crucially, the advent of NLP with neural networks brought about the triumph of companies with large computational resources. 

One of Google’s most successful inventions is a word embedding model called BERT. Short for Bidirectional Encoder Representations for Transformers, this model uses a revolutionary new technique called attention, which was developed in the context of neural machine translation (at, you guessed it, Google).

Word embeddings are high-dimensional vectors that encode a great deal of information about a word. It takes days to train BERT, and the resulting model is so rich that Google decided to implement it in its search engine last autumn, marking the biggest change to its search algorithms in nearly five years.

What NLP Tools Does Google Offer?

Google’s cloud-based NLP tools allow you to benefit from state-of-the-art language models, as well as from vast computational possibilities. BERT forms the foundation for the pretrained models used in the Natural Language API and the AutoML API. In addition, the latter uses transfer learning and hyperparameter tuning to help you train your own models. Let’s look at both of them in detail.

Natural Language API

Say your company produces frying pans and you want to know how people feel about your product. Luckily, many customers have left comments on your website describing their experiences with their new frying pan. You can now use the Natural Language API to analyze these comments and get a summary of your customers’ emotions in relation to your product, all without reading a single review!

This type of inspection is called sentiment analysis. It returns a score ranging between -1 and 1, telling you whether sentiment is negative or positive.

The other features of the API are named entity recognition — which persons, places, and products does your text mention? — and syntax analysis, which can be used as an input to other applications. Finally, the Natural Language API offers a content classification feature, which classifies your text into predefined categories. Not all of the features support the same languages, but most of them are available for several European and Asian languages.

AutoML Natural Language

In machine learning, the term AutoML describes a family of algorithms that identify the best settings (called hyperparameters) for your model. With increasingly complex model architectures, finding the best hyperparameters is crucial and cannot be done by hand. Google’s AutoML even suggests its own neural architectures, resembling a self-programming machine. In order to use AutoML, you don’t need to be a programmer, but you do need labeled data for the task. As we all know, high-quality data is hard to come by, especially in the magnitudes required for a neural model to perform well. Therefore, AutoML uses Google’s pretrained models and fine-tunes them on your data. This smart, resource-saving technique is known as transfer learning.

Other Google NLP Tools

In addition to the services that we’ve described so far, Google also offers a Speech-to-Text API for transcription of audio speech files, and a Translation API, which lets you integrate Google’s machine translation into your own application. Both technologies use neural networks.

How Can I Use Google’s NLP Tools?

If you want to try these tools yourself, you’ll need a Google account with a credit card on file. This is because only initial requests can be processed free of charge. If you were to cross a certain threshold (about 5,000 requests, depending on the service), you’d pay a fixed amount for every additional 1,000 requests, or by the hour when training models with AutoML.

Google’s Natural Language API in Action

We wanted to test the API ourselves, and used Python to analyze the sentiments expressed in our frying pan reviews. The code is adapted from this example. We added a line for getting the API credentials, and adapted the script to take input from the command line.

from google.cloud.language_v1 import enums, LanguageServiceClient

import sys

def sample_analyze_sentiment(text_content):

# initialize client with credentials
client = LanguageServiceClient.from_service_account_json(“path/to/file”)

# specify the type of document (PLAIN_TEXT or HTML)
type_ = enums.Document.Type.PLAIN_TEXT

document = {“content”: text_content, “type”: type_}

# specify the encoding (NONE, UTF8, UTF16 or UTF32)
encoding_type = enums.EncodingType.UTF8

response = client.analyze_sentiment(document, encoding_type=encoding_type)

# get overall sentiment of the input document

print(u”Document sentiment score: {:.2f}”.format(response.document_sentiment.score))

# print language of the text
print(u”Language of the text: {}”.format(response.language))

if __name__ == “__main__”:
# get the input from the command line
review = sys.argv[-1]
sample_analyze_sentiment(review)

We called our script g_sent.py and ran it from the command line. Thanks to the last function of our code, the text to be analyzed (which we partly borrowed from this article about bad frying pans) is simply the last argument of our command.

python sent.py “After only two years, my frying pan is already past its prime. The non-stick coating has flaked off into a lovely decorative pattern”
# Output:
Document sentiment score: -0.20
Language of the text: en

python sent.py “This pan is ultra tough and non-stick”
# Output:
Document sentiment score: 0.50
Language of the text: en

From the above it looks like we may not even need to train our own model. Google’s pretrained model can distinguish between positive and negative reviews, and can even handle the irony in the first review. 

But would it perform just as well with other languages? We tried a German language review:

python sent.py “Die Bratpfanne ist so schwer, dass man sie nur mit zwei Händen tragen kann”
# Output:
Document sentiment score: -0.20
Language of the text: de

The model recognized that a frying pan that has to be held with both hands might not be ideal. Given these satisfying results, we were curious to test the out-of-the-box named entity recognition. Again, we slightly adapted the template code to suit our own needs. We called the script g_ent.py. Here’s what we got:

python ent1.py “I bought this sturdy frying pan while on vacation in Cyprus. I met my husband Alessandro on that same holiday. We have been married for twenty years, and we are still using it!”
# Output (selection):
Representative name for the entity: frying pan
Entity type: CONSUMER_GOOD
Salience score: 0.28
Mention text: frying pan
Mention type: COMMON

Representative name for the entity: Alessandro
Entity type: PERSON
Salience score: 0.22
Mention text: husband
Mention type: COMMON

Representative name for the entity: Cyprus
Entity type: LOCATION
Salience score: 0.12
wikipedia_url: https://en.wikipedia.org/wiki/Cyprus
Mention text: Cyprus
Mention type: PROPER

As we can see, not only does the model output a Wikipedia page link for the recognized entities (provided one exists), it also computes a “salience score” that rates the prominence of a word relative to the rest of the text. There are numerous articles on the web (such as this one) on how to exploit this score for search engine optimization (SEO).

What Alternatives are There to Google’s NLP Tools?

Google certainly has vast resources and state-of-the-art trained language models, but its Natural Language API services are standard NLP applications. While you probably wouldn’t want to train a BERT language model yourself, you can use pre-trained models to build your own applications. And even if you lack labeled data for training, there are plenty of ready-made sentiment analyzers like VADER, or the named entity recognition offered by spaCy, NLTK, and the Stanford tagger.

If you do have enough labeled data to train a neural network, you could try AutoKeras for finding the optimal architecture. But if your data is sparse, you might look into how to adapt pre-trained models with transfer learning. While Google’s AutoML service is impressive, it is also pretty expensive — and probably overkill for most applications.

Conclusion

In this post, we learned about Google’s role in NLP and the tools it provides for doing NLP at home. We saw how to perform sentiment analysis and named entity recognition using the Natural Language API, and we looked at some open-source alternatives to Google’s NLP tools.

Want to learn more about NLP? Enroll in our Nanodegree program to master the skills you’ll need to work in the field of natural language processing.

Start Learning