A closer look at Google Duplex

Google's appointment booking AI wowed the crowd and raised concern at I/O

A month and change after I/O, Google convenes a meeting of a few small groups of journalists at an upscale Thai restaurant in Manhattan’s Upper East Side. It’s an unusual locale for one of the world’s largest companies.

The tables are cleared out to make room for nine chairs, in three rows of three, facing a large, brightly lit display. To the side, four Google employees sit behind a desk at a makeshift control center. The company is finally ready to offer a little more insight into Duplex, the most widely discussed — and controversial announcement during a rapid-fire keynote.

It’s a 180-degree shift from that sun-drenched day at Mountain View’s Shoreline Ampitheatre. All by design, of course. The cozy New York restaurant makes as much sense as any for such an event, as the company pulls back the curtain on the AI-based reservation service. Thep Thai’s owner insists that such a service would be something of a godsend for the 100-plus reservations the restaurant fields on a daily basis.

For Google, it was clearly time to offer some more transparency into both the purpose for such a system and the workings behind it. The brief demo presented by CEO Sundar Pichai raised far more questions than it answered. The think pieces began to flow, exploring the ethical ramifications for a system that appeared to be designed to fool a business into believing they were talking to a fellow human being.

Duplex represents a rare early look into an ongoing project from a company notorious for playing it close to the vest. But disclosure is key. As with self-driving cars, rigorous real-world testing is required to iron out all of the kinks in the system.

“While we’re not widely launching this feature yet, we’re sharing more information about this technology to provide transparency and encourage feedback,” the company writes in a blog post today. “It’s important that we get the experience right both for people and for businesses, and we’re taking a slow and measured approach as we incorporate learnings and feedback from our tests.”

The nature of Google’s process was likely to get out some way or another, so announcing it at I/O served the dual purpose of getting in front of that narrative and offering an early look at an ambitious project on one of the company’s largest stages.

“What you’re going to hear is the Google Assistant scheduling an appointment at a real hair salon,” Pichai said to tentative applause during the keynote:

Hi, I’m calling to book a women’s haircut for a client. Um, I’m looking for something on May 3. – Google Assistant

Sure, give me one second. – Receptionist

Mm-hm. – Google Assistant

It was here the audience laughed, unbelieving. Then applause. Sure, the audience was in on the joke, but it was still hard to believe that what we were hearing was a purely automated version of Google’s AI assistant. The “mm-hm” was icing on the cake — a subtle vocal tick included to further conversation, all while leaving the other party none the wiser that she was speaking to a ‘bot.

Those vocal breaks, known as “speech disfluencies” in linguistics, are a normal and frequent part of speech, and a key part of the secret sauce that makes Duplex such a remarkable product. Among other things, they’re a polite workaround for the system.

If Duplex is confronted with an uncertain response after requesting a reservation for a party of five, for example, it will reiterate with the slight variation, “um, for five.” That, hopefully, will resolve potential confusion on the part of the receptionist, while including a subtle linguistic tick that lends a further sense of reality to the conversation.

These elements are a very real part of the way Duplex works. I can confirm this, having stood in for the role of the receptionist during a demo at the Thai restaurant. As for the two demos played over the big screen at I/O, they were, in fact, real. Even more interestingly, the company says it informed the businesses after the calls were placed, seemingly to lend an extra level of authenticity to the process.

Duplex was — and still is — very much a work in progress. Among other things, the system didn’t provide a disclosure in the early days, a fact that could potentially violate the “two-party consent” required to record phone calls and conversations in states like Connecticut, Florida, Illinois, Maryland, Massachusetts, Montana, New Hampshire, Pennsylvania, Washington and Google’s own home base of California.

“The consent-to-record issues here go beyond just Duplex to the broader legal implications of machine speech,” said Gabe Rottman, director of the Technology and Press Freedom Project at the Reporters Committee for Freedom of the Press.  “If the service extends to all-party consent states or globally, you could see questions pop up like whether consent is valid if you don’t know the caller is a machine.  Curveballs like that are just going to multiply the more we get into the uncanny valley where automated speech can pass as human.”

Going forward, the system will be confined to those states where the laws make it feasible. That also applies to interstate calls, so long as both sides are covered. “We want to make sure it operates in a way that’s governed by whatever the laws are that are appropriate for that call,” Google Assistant VP, Product and Design Nick Fox says.

While the disclosures weren’t there in the earliest stage, the company has said since the beginning that it intended to add them. The motivation, however, wasn’t due to feared legal repercussions, so much as common robot/human etiquette.

“The Google Duplex technology is built to sound natural, to make the conversation experience comfortable,” the company wrote in a blog post tied to the announcement. “It’s important to us that users and businesses have a good experience with this service, and transparency is a key part of that. We want to be clear about the intent of the call so businesses understand the context. We’ll be experimenting with the right approach over the coming months.”

Pressed by the media about what form such “transparency” would ultimately take, a spokesperson for the company added later, “We understand and value the discussion around Google Duplex — as we’ve said from the beginning, transparency in the technology is important. We are designing this feature with disclosure built-in, and we’ll make sure the system is appropriately identified. What we showed at I/O was an early technology demo, and we look forward to incorporating feedback as we develop this into a product.”

In its current form, that plays out thusly:

Hi, I’m the Google Assistant calling to make a reservation for a client. This automated call will be recorded.

Duplex doesn’t let on the fact that it’s an AI — but if you have some familiarity with Google Assistant, you can probably put that part together yourself. It does, however, let you know that the call is being recorded. Google records these conversations for both voice to text processing and quality assurance purpose, so the company can continue to revise and refine the system.

In my test call, I attempt to get Google Assistant to repeat that bit — it’s easy enough to not hear that opening line, particularly when you’ve got the phone up to your ear inside a crowded restaurant. But the AI just barrels on with the reservation. If you miss the disclosure, you’re out of luck — for now, at least. At present, the only way to opt out of being recorded is to just hang up the phone — not the best way to get repeat visitors.

“We do have a mechanism that will say ‘okay, I won’t record you,’ ” Google Assistant VP Engineering Scott Huffman explains. “I think we’re still figuring out what’s the right thing to do there. Is the right thing bow out? To basically throw away the recording?”

Like just about everyone else getting a demo that day, I try my best to throw the system off. Assistant asks for a booking at 6PM. I tell it we’re not open until 11 — this is Manhattan, after all, the best/most exclusive places keep the most insane hours, right? Assistant politely ends the call — or “bows out,” as Google puts it.

The Holy Grail here is attempting to Turing test the shit out of Duplex. If you succeed, one of Google’s human operators will take the controls and land the plane. These human operators are an integral part of testing for Duplex, and Google says it plans to keep them around in some form going forward, to assure that things never get too out of control. How large a group that will ultimately take remains to be seen.

No one in our small group succeeds in invoking a real-life human during our brief chats, though we learn some important insights into the systems’s limitations. For instance, asked to “repeat the last four numbers,” it restates the phone number in its entirety. It’s not a flaw, exactly, but it does show a simple place where the system is pushed to its limitations with regard to the understanding of the the subtle nuances of human conversation.

Asked for the user’s email address, on the other hand, the system simply says it doesn’t have the permission of its “client” to disclose such information, maintaining the whole “assistant” relationship. Google says that, in testing, the system has also gotten tripped up encountering another machine by way of a phone tree. Listening closely because our menu options have changed doesn’t appear to compute just yet.

At present, Google says Duplex is able to complete four out of five fully automated tasks, according to the company. Eighty-percent is pretty good, but Google is pushing to make things better. “We want to make sure that we’re not wasting the business’s time,” Fox says. “We want to make sure throughout everything we do here, that this is good experience for the business and that they’re not getting frustrated talking to an assistant while they’re trying to run their business.”

As announced at I/O, more testing will commence this summer. Over the “coming weeks,” the next round will find Assistant inquiring about business hours. And in the next few months, it will expand to restaurant reservations and hair salon appointments. Unlike those I/O demos, these will occur with “a limited set of trusted testers and select businesses,” who will be in on it.

Companies thus far seem eager to get on-board. As Google notes, according to a customer survey it conducted back in April, “60 percent of small businesses who rely on customer bookings do not have an online booking system set up.”

For users who simply don’t want to pick up the phone, Duplex provides a compelling alternative. For those businesses, it means adding more potential customers. Those who’d rather not get on-board for any number of reasons, on the other hand, will be able to opt out through their Google Business listings (assuming they have one). 

The box reads:

Let customers use the Google Assistant to book with you. Also, quickly update your listings by getting occasional calls to confirm your detail.

The system has come a long way since it began life as a jury-rigged demo with an office phone placed gingerly atop a MacBook. Duplex operates through a complex combination of speech to text, text to speech and Google’s own WaveNet audio processing deep neural network. The early demos weren’t live as some speculated, but they were, in fact, real — and things are only getting more impressive from there.

Like it or not, Duplex is coming soon. And the only way to stop it is to hang up the phone.