Author: Dr. Robert Dale
Last December, I participated in a panel on Natural Language Generation at LT-Accelerate in Brussels. The organizers very kindly took our suggestion for naming the panel session, calling it 'The New Science of Information Delivery'. But since then, quite a few people have asked me just what we meant by that title. So, to kick off the new Arria NLG blog, we thought it might be helpful to provide an answer to that question.
Every sophisticated technology has a scientific basis. Engineering disciplines rely on physics; food technologies and materials production rely on chemistry; and medicine relies on what science can tell us about human physiology. So what science is NLG based on?
The most obvious candidate is linguistics. Linguistics and its many sub-fields give us the theories and concepts we need to build software applications that use natural language appropriately and correctly. Research in discourse analysis tells us how we can organize information to tell a coherent story. Research in semantics tells us how to represent information, so that it can be communicated via language. Research in pragmatics alerts us to how language might be perceived in context, so that we can avoid unintended interpretations. Research in syntax helps us determine how to express particular meanings using the resources offered by a particular language. Research in morphology gives us the knowledge we need to find the correct form a word to use, such as a noun's plural or a verb's past tense.
So it's rather hard to build a natural language generation application without at least some awareness of linguistics, and NLG technology needs to have linguistic knowledge baked-in if you want it to do the right thing. Arria's NLG Engine is structured the way it is for precisely this reason.
But linguistics is not the only science that NLG draws from. If I've come to your house for dinner on a winter's evening, and you've left the window open, and I say through my chattering teeth 'Gosh, we've been having rather cold weather lately, don't you think?', then maybe what I'm really saying, albeit with a good measure of British politeness, is 'Can we close that bloody window?' The point is that we need to take account of not just the literal meaning of what we say, but also how it is received by the listener. This is related to the significance of pragmatics, which is all about how language is interpreted in context, and very quickly gets us to considerations of psychology, and in particular its language-oriented daughter, psycholinguistics. That's the science that, amongst other things, helps us understand when what we say might be ambiguous or misinterpreted. And since we don't want our NLG applications to be ambiguous or misinterpreted, we'd best add psycholinguistics to the shopping basket of scientific bases to rely on. It turns out that this is a remarkably rich source of insights around what to do and what not to do if you want to build a really good NLG application.
But so far this is still rather a narrow view, perhaps encouraged by thinking about NLG from a technology perspective rather than as a solution to a problem.
The problem we're using NLG to solve is that of delivering information. More specifically, delivering the right information to the right people at the right time — and in the right manner. It's that last bit that is the game changer here. When you accept that the goal is to efficiently and effectively deliver information (there can be other goals for NLG, like to entertain, to persuade, or even to deceive, but we'll leave those for another time), then the focus shifts away from NLG itself to the outcome of using NLG. And when you take that shift on board, you realize that there are many more sciences that are relevant to the task.
And just what are those other contributing sciences? See Part 2 to find out!
Author: Dr. Robert Dale, Chief Technology Officer and Chief Strategy Scientist at Arria NLG. Dr Dale is recognized as one of the world’s foremost experts in Natural Language Generation (NLG) research and Development, having authored or edited seven books and 160 papers on computational linguistics. He was a Professor in the Department of Computing, Director of the Centre for Language Technology at Macquarie University. He co-authored the seminal textbook “Building Natural Language Generation Systems” with Arria NLG Chief Scientist and co-founder Prof. Ehud Reiter.