The Challenges of Explaining Conversational AI

“Multimodal Conversational AI is one of those things that when you experience it, you immediately get it,” says our Director of Marketing, Megan Swiatkowski. “But words often fail to bring the same level of understanding and excitement that a demo brings. I liken it to a rollercoaster - it’s way more fun to experience than to describe.”

Andrei Papancea


The definition of AI is relatively simple.

According to Merriam Webster, the definitions are:
1. A branch of computer science dealing with the simulation of intelligent behavior in computer
2. The capability of a machine to imitate intelligent human behavior.

While the definition of AI is simple, it’s also expansive. This field of computer science has limitless applications in almost every industry, from augmented reality to email spam filters, robotic warehouses, and more. Because of how vast AI is, the word triggers different mental images, from sci-fi movies to robots and more. Adding the word “Conversational” in front of AI helps twist the mental kaleidoscope to focus on computers engaging in a human-like conversation, however, it’s still a fairly broad term.

These pre-determined mental images are extremely important when it comes to marketing and advertising your Conversational AI business because they impact how people search. I’ll give an example of a finding we came across earlier this year. We ran a Google Advertising Campaign, and one of our keywords was “Virtual Assistant”, a digital guide helping the human through the text, voice, or other conversation flow. It's a fairly standard term in our industry, however, we quickly discovered that people searching for “virtual assistants” weren’t looking for Conversational AI products, but for remote jobs as an assistant. Even with negative keywords, our click-through rate wasn't even half of some of the other terms we used.

Pre-determined mental images in addition to the broad AI use cases can make it difficult truly explain what your Conversational AI technology does. But, we've seen success in focusing more on how - through which medium - Conversational AI technology can help automate a customer service use case. Conversational AI (CAI)'s main categories include Interactive Voice Response (IVR), Chatbot, Visual IVR, and Multimodal.

Let's break those terms down:

  • Interactive Voice Response (IVR) refers to the “technology that allows humans to interact with a computer-operated phone system through the use of voice and dual-tone multi-frequency signaling (DTMF) input via a keypad,” per Wikipedia.
  • A Chatbot is a computer program that (usually) leverages natural language processing (NLP) technology to understand what a user is saying and constructs a response that simulates human-like conversation.
  • Visual IVR “uses web applications to instantly create an app-like experience for users on smartphones during contact center interactions without the need to download any app," per Wikipedia.
  • Multimodal is the combination of two or more modalities - such as voice, text, images, and video - into a single, rich interactive experience.

NLX's platform can help businesses build self-service use cases using any of these technologies, however, the one we're most excited about right now is Multimodal. Multimodal is one of the newest CAI categories, with significantly fewer examples in the marketplace. It’s often confused with Visual IVR, where you may be texted or emailed a link, and the smart assistant drops. 

“Multimodal Conversational AI is one of those things that when you experience it, you immediately get it,” says our Director of Marketing, Megan Swiatkowski. “But words often fail to bring the same level of understanding and excitement that a demo brings. Describing Multimodal CAI is like describing a rollercoaster - way more fun to experience, way less fun to describe.”

As you can see, when words fail, marketers often resort to metaphors and similes. Meg’s giving that a try in our messaging. 

“The closest thing I can liken Multimodal Conversational AI to is CLEAR at the airport, but instead of a kiosk, you use your phone. And instead of an agent standing by, it’s a smart virtual assistant. It's similar because just like CLEAR, Multimodal CAI users are interacting with two or more modalities (images, text, videos, etc) while a smart virtual assistant (the CLEAR agent) is guiding you through the process in the context of what you’re doing. Not to mention, Multimodal Conversational AI has personalization features (like the CLEAR eye scan) which make resolving customer inquiries efficient and effective!"

The race is on for Marketers and companies in the CAI space to make what they do more accessible, NLX included. We have a few exciting projects in the works that we believe will help further unlock the market for us, and I’ll be sure to share once they’re on our website. If you see any cool explainers in the meantime, please feel free to send them my way (and Meg’s way - she loves nerding out on Marketing!

Andrei Papancea

Andrei is our CEO and swiss-army knife for all things natural language-related.

He built the Natural Language Understanding platform for American Express, processing millions of conversations across AmEx’s main servicing channels.

As Director of Engineering, he deployed AWS across the business units of Argo Group, a publicly traded US company, and successfully passed the implementation through a technical audit (30+ AWS accounts managed).

He teaches graduate lectures on Cloud Computing and Big Data at Columbia University.

He holds a M.S. in Computer Science from Columbia University.