What is Natural Language Processing?

When talking to clients, we often get asked how Natural Language Generation (NLG) relates more generally to Natural Language Processing (NLP).  The short answer is that NLG is a specific kind of NLP that’s focused on taking data as input and producing language as output.  But to really see where NLG fits into the picture, it’s useful to understand the other kinds of software applications that fall under the heading of Natural Language Processing.  That’s the aim of this blog posting.

NLP has been around for a long time, going back to at least 1954, when a collaboration between IBM and Georgetown University resulted in the first public demonstration of a machine translation system.  Machine Translation (MT for short) is one of the most publicly visible uses of NLP technology, as manifested in the translation capabilities present in major search engines like Google and Bing.  But between the 1950s and now, a whole host of other NLP applications have been explored, and a number of those are commercial successes today.  Here’s a breakdown into four distinct categories that can help in trying to make sense of today’s NLP landscape.

1 Sentiment Analysis

Sentiment analysis is probably the most talked-about use of NLP technology today. Sometimes also called opinion mining, the key goal here is to determine whether a given text expresses a positive or negative sentiment towards some person or thing. This is particularly useful for assessing online reviews, where every manufacturer or vendor cares about how their products or services are perceived. And in a world where social media is so prevalent, every blog post, online customer review, tweet, or Facebook Like can impact on that perception massively.

Automated sentiment analysis makes it possible to get a reading of how your product is being received.  Doing this well is a lot harder than you might think, when you consider things like the use of sarcasm, and the problem that a product can have both good and bad aspects at the same time.

2 Grammar Checking

Another really visible use of NLP technology – so widespread, in fact, that it’s often not even recognised as a sophisticated NLP technology – is the grammar checker in your word processing software.

For all the criticisms this technology receives, it’s important to realise just how difficult a task grammar checking is.  Most NLP applications make some pretty strong assumptions about the ‘well-formedness’ of the text they deal with, taking for granted that spelling or grammatical errors are likely to be uncommon.  That’s reasonable for many kinds of text, but grammar checkers have to operate at the real coal-face of text production, where you really can’t have any expectations about the quality of the text your user will present you with, and you need to be able to deal with anything your users throw at you.

3 Information Extraction

A third major category of NLP application is called Information Extraction (IE), also known as Text Mining.  This is about distilling from a text a predetermined key set of information values, and ignoring everything else.

So, for example, I might just want to know what news items contain a mention of a particular individual or company, in which case we’re looking at a kind of ‘semantically aware’ search that goes beyond simple string search.

Or I might want to mine a collection of business news stories or press releases to identify mergers and acquisitions, and to automatically extract the names of the companies involved and the details of the financial aspects of the deals.

Again, there are many vendors of tools that can do this kind of thing, and in fact IE is often also a component subtask in other NLP applications – for example, accurate identification of named entities is an important element of good sentiment analysis.

4 Question-Answering

Finally, a number of NLP applications fall into a broad category we might call question-answering (QA).  You can think of apps like Siri, Google Now and Cortana as question-answering apps wrapped up in speech recognition and speech synthesis:  given a question, we have to find an answer to that question.

In the early days of NLP, this kind of application was used in the context of database query, so that managers with no knowledge of query languages like SQL could just type natural language questions like ‘Who’s been our best performing salesperson on the East Coast over the last year?’ Today the technology has achieved considerable visibility thanks to IBM’s Watson, a question-answering system that beat the best human contestants on the Jeopardy! game show.

The technologies described above focus on taking text as input and doing something with it; but NLG goes in the other direction, taking data, or some representation of information, as input, and producing text as output.  Technically, the applications described above are all instances of Natural Language Understanding (NLU).  Natural Language Processing is really a super-category that covers both NLU and NLG.


What’s the difference between NLU and NLG?

Here’s a table that summarises the difference between the kinds of applications described above and Natural Language Generation in terms of inputs and outputs:

NLG - NLP table
NLG technology makes it possible for a computer to produce natural languageoutput that replicates the way in which humans produce quality narrative reports. Just as a human takes input from a variety of sources to produce carefully crafted output that tells the full story, NLG software chooses the best methods to communicate the input and analysis, be it text, visualisations or annotated graphs that mix modalities to their best advantage.

That’s what we do here at Arria NLG.  We help humans understand their business inputs by automating their analysis and reporting to enable them to be more agile and make informed decisions quicker.  Take a look at our use cases to find out how we’ve helped our clients do just that.



Dr. Robert Dale is Arria NLG’s Chief Technology Officer.  Prior to joining Arria in 2012, he was Professor in Computational Linguistics at Macquarie University in Sydney, Australia, and from 2003 to 2012 he was editor-in-chief of Computational Linguistics, the premier international journal for Natural Language Processing.


Subscribe to the Arria Blog