Beyond the GUI: It's Time for a Conversational User Interface

The GUI has served us well for a long time, but it's beginning to fray around the edges. We’re now grappling with an unintended side effect of ubiquitous computing: a surge in complexity that overwhelms the graphical-only interface -- especially when forced into a mobile-interface world. We now need to simply talk with our devices. That's why it’s finally time for the conversational user interface, or "CUI." This is the interface of the future as computing propagates beyond laptops, tablets and smartphones to cars, thermostats, home appliances, watches ... and even glasses.
Image may contain Vehicle Transportation Airplane Aircraft Jet Warplane and Bomber
Captain Picard ordered "Tea. Earl Grey. Hot" from the replicator using just his voice.

It wasn't just cost and Moore's law. The graphical user interface -- now known as the GUI ("gooey") -- is what really made computing widespread, personal and ubiquitous. Its friendly icons and point-and-clickability made computers approachable, enabling ordinary people to do extraordinary things on devices previously only available to military and high-powered experts.

But the GUI, though it's served us well for a long time, is beginning to fray around the edges. We're now grappling with an unintended side effect of ubiquitous computing: a surge in complexity that overwhelms the graphical-only interface. It can take as many as 18 clicks on 10 different screens to make one simple airline reservation while we're faced with an unwieldy array of buttons, ads, drop-downs, text boxes, hierarchical menus and more.

What makes the problem worse is that we're forcing the GUI into a mobile-interface world even as the information and tasks available to us continue to increase. Whether it's because of available real estate or the desire for invisible design, interface screens are increasingly smaller, narrower or simply nonexistent.

What we need now is to be able to simply talk with our devices. That's why I believe it's finally time for the conversational user interface, or "CUI."

This is the interface of the future, made even more necessary as computing propagates beyond laptops, tablets and smartphones to cars, thermostats, home appliances and now even watches ... and glasses.

#### Ron Kaplan

##### About

Ron Kaplan leads Nuance Communications' NLU R&D Lab in Silicon Valley. Prior to that, he was at Microsoft Bing, which he joined upon the acquisition of Powerset, where he served as chief technology officer. Kaplan is also a consulting professor of linguistics at Stanford University, an ACM Fellow and former Research Fellow at Xerox PARC. Kaplan earned his bachelors in mathematics and language behavior from U.C. Berkeley and Ph.D. in social psychology from Harvard University.

The CUI is more than just speech recognition and synthesized speech; it's an intelligent interface.

It's "intelligent" because it combines these voice technologies with natural-language understanding of the intention behind those spoken words, not just recognizing the words as a text transcription. The rest of the intelligence comes from contextual awareness (who said what, when and where), perceptive listening (automatically waking up when you speak) and artificial intelligence reasoning.

Instead of pulling up an app like OpenTable, searching for restaurants, tapping to select time, and typing in party size, we can say, "Book me a table for three at 6 tonight at Luigi's."

This type of "conversational assistant" capability is already reaching mainstream consumers due to mobile device features and applications like Apple's Siri, Samsung's S-Voice and Nuance's Dragon Mobile Assistant.

But this is just the first generation: It showcases what's possible and only hints at what's to come. Because as language and reasoning frameworks combine with machine learning and big data, conversational interfaces will understand our intent. They will better understand our wants and needs as they learn more about us and our surroundings.

To "book a table at Luigi's for me, John and Bill, about an hour after my last meeting," the next-generation CUI will know from our calendars when our last meeting ends, calculate that we need a reservation for three, and even send invitations to John and Bill based on our contacts list.

>Why should we have to talk machine-speak?

Why should we have to talk machine-speak, issuing direct commands like, "Change to channel 11" with unnatural phrasing constraints? Why can't we just naturally say, "Can I see that movie with the actress who tripped at the Oscars?"

Here's how: The CUI will be able to understand and break down this expressed interest into the following sequence: "Who tripped at the Oscars?" --> "Jennifer Lawrence movies?" --> "Silver Lining Playbook times/channel" ... to actually "Change to channel 11."

And as these conversational interface systems become increasingly intelligent and attuned to our preferences, interactions will become even more human over time. Conversations will become seamless. People and machine systems will be able to have meaningful exchanges, working together to satisfy a goal ("That movie isn't on now. Should I put on the LeBron James game instead?"). Ultimately, people will get direct access to the content they want and immediate responses from their devices.

The CUI has another huge advantage over a GUI: We can talk about hypothetical objects or future events that have no graphical representation.But the CUI has another huge advantage over a GUI: It can allow people to talk about hypothetical objects or future events that have no graphical representation.

We might say, "Move $500 to my savings account when my paycheck comes in" or, "Let me know when I'm near a café -- but not a major chain." A CUI is much more flexible, able to monitor for abstract events such as an upcoming payday or a distant GPS location.

When the creators of Star Trek imagined the conversational interface of the 24th century, Captain Picard had to tell the replicator, "Tea. Earl Grey. Hot" -- his expression was constrained by the awkward dialect of a 20th-century keyword search engine.

Here, in the 21st century, we will be able to conversationally say, "How 'bout some tea?" ... and actually get that Earl Grey tea, hot. That's because a CUI will know who we are and understand what we mean.

Related Opinion

Trawling for Babel Fish: The Quest for the Universal Translator

Many of these capabilities are already appearing as part of our devices today. Voice recognition accuracy has improved dramatically and language and reasoning programs have reached a useful level of sophistication. We still need better models of cooperation and collaboration, but those are also coming along. Putting it all together, we'll soon have intent-driven, fully conversational interfaces that will be adaptable to just about anyone.

So ordering tea this way isn't a distant, sci-fi scenario. It isn't a far-off vision. It's very real, and it's almost here now.

The replicator, on the other hand, may take more work.

Editor: Sonal Chokshi @smc90