To begin with, Alexa already exhibits common sense in a number of areas. For example, if you say to Alexa, “Set a reminder for the Super Bowl”, Alexa not only identifies the Super Bowl date and time but converts it into the customer’s time zone and reminds the customer 10 minutes before the start of the game, so they can wrap up what they are doing and get ready to watch the game.
Another example is suggested Routines, where Alexa detects frequent customer interaction patterns and proactively suggests automating them via a Routine. So if someone frequently asks Alexa to turn on the lights and turn up the heat at 7:00 a.m., Alexa might suggest a Routine that does that automatically.
Even if the customer didn’t set up a Routine, Alexa can detect anomalies as part of its Hunches feature. For example, Alexa can alert you about the garage door being left open at 9:00 p.m., if it’s usually closed at that time.
Moving forward, we are aspiring to take automated reasoning to a whole new level. Our first goal is the pervasive use of commonsense knowledge in conversational AI. As part of that effort, we have collected and publicly released the largest dataset for social common sense in an interactive setting.
We have also invented a generative approach that we call think-before-you-speak. In this approach, the AI learns to first externalize implicit commonsense knowledge — that is, “think” — using a large language model combined with a commonsense knowledge graph such as ConceptNet. Then it uses this knowledge to generate responses — that is, to “speak”.
For example, if during a social conversation on Valentine’s day a customer says, “Alexa, I want to buy flowers for my wife”, Alexa can leverage world knowledge and temporal context to respond with “Perhaps you should get her red roses”.
We’re also working to enable Alexa to answer complex queries that require multiple inference steps. For example, if a customer asks, “Has Austria won more skiing medals than Norway?”, Alexa needs to combine the mention of skiing medals with temporal context to infer that the customer is asking about the Winter Olympics. Then Alexa needs to resolve “skiing” to the set of Winter Olympics events that involve skiing, which is not trivial, since those events can have names like “Nordic combined” and “biathlon”. Next, Alexa needs to retrieve and aggregate medal counts for each country and, finally, compare results.
A key requirement for responding to such questions is explainability. Alexa shouldn’t just reply “yes” but provide a response that summarizes Alexa’s inference steps, such as “Norway has won X medals in skiing events in the Winter Olympics, which is Y more than Austria”.