The mission of the OVAL lab is to create open advanced technology so developers can create robust, friendly, multilingual voice assistants cost-effectively. Our goal is to enable every organization to provide voice-based assistance to their services as easily as creating a website. We envision an open World Wide Voice Web where voice agents are created once and are accessible via every language and on every device (smart speakers, smart and feature phones, cars).
Commercial chatbots today are fragile, requiring developers to hardcode possible conversational paths in anticipation of what the user may say. Large language models, such as GPT-3, while versatile and general, are not grounded and cannot be controlled to provide correct and reliable information and perform actions with side effects. Our approach is to develop novel and effective voice assistant authoring tools that combine novel programming languages concepts with deep learning and large language models.
Specifically, we are creating GenieScript, a voice assistant language that lets developers create task-oriented agents by simply providing a high-level script of the conversational flow and access to their existing databases containing domain knowledge. The GenieScript system will follow the script, while automatically responding to unanticipated user inputs and help them navigate the knowledge bases. More importantly, GenieScript is sample efficient; only a small amount of annotated data is needed for few-shot learning.
The voice agents we create are the first to use contextual neural semantic parsing where a formal representation of the conversation so far is fed into a neural network to determine the semantics of the incoming sentences. The semantics is represented formally by an executable programming language called ThingTalk. ThingTalk grounds the conversation by mapping the users request to precise database queries and API executions, which is significantly much richer than the intents typically used in dialogue flow. The GenieScript system automatically synthesizes large quantities of conversational data to fine-tune large language models. By including a small sample of hand-annotated data, the developer can expect to produce a first basic agent that can understand users’ statements with about 60-70% accuracy in just a few days.
Our overall goal is to create tools that enable developers to easily create deployable agents. We aim to understand the gap between current academic research approach and practice through experimentation of real life use cases, and to devise useful methodology and tools that can be used in practice.
Starting with queries for databases one command at a time, our current research is taking on these challenges:
We are collaborating with the following groups on these projects.
AI Workshop, Stanford Computer Forum 2022, April 4-6, 2022
We have now created a GenieScript system that can be used for experimental prototypes. Our experience shows that the GenieScript system has made it possible for small teams to create useful prototypes quickly.
We have developed a Genie Assistant, which can perform the ten most popular skills. Genie can play songs, podcasts, radios, news, help find restaurants, answer questions, forecast weather, set timers and reminders, and control IoTs. It can control 8 different kinds of appliances, from thermostats, switches, lights, fans, doors, locks, window covers, to the vacuum cleaner; it can also control 7 different kinds of sensors, temperature, motion, illuminance, humidity, flood, ultra-light, and battery.
The First Workshop on the World Wide Voice Web, Nov 20, 2021.
State of the Open Home Workshop, Home Assistant, Dec 11, 2021.
Genie is used to create a voice interface for a 3D Building Information Management software. This enables blue-collar workers to use voice to easily access digital information on the job.
AI Workshop, Stanford Computer Forum 2022, April 4-6, 2022
Virtual Assistant 2.0 improves the quality & lowers the cost of dialogue agents with automatic synthesis of high-quality training data
|Examples of Access Control|
|Allow my daughter to watch Netflix only before 8pm.|
|Allow my son to purchase any household item under $10 on Amazon.|
|My dad can access my security camera, only when I am not home.|
|Whenever I am out of town, let my secretary read email messages, whose subject is marked ‘urgent’.|
|Allow colleagues to add GitHub issues to my to-do list.|
|Authors can only read those files with their names in the title.|
|Examples of Long-Tail Questions||Alexa||Siri||Genie|
|Show me restaurants rated at least 4 stars with at least 100 reviews.||✔|
|Show restaurants in San Francisco rated higher than 4.5.||✔||✔|
|What is the highest rated Chinese restaurant near Stanford?||✔||✔|
|How far is the closest 4 star restaurant?||✔|
|Find a W3C employee that went to Oxford||✔|
|Who worked for Google and lives in Palo Alto?||✔|
|Who graduated from Stanford and won a Nobel prize?||✔||✔||✔|
|Who worked for at least 3 companies?||✔|
|Show me hotels with checkout time later than 12PM||✔|
|Which hotel has a pool in this area? ||✔||✔||✔|
|Joint Accuracy (MultiWOZ 2.1)|
|TRADE (Wu et al., 2019)||45.6|
|SUMBT (Lee et al., 2019)||46.7|
|DSTQA (Zhou and Small, 2019)||51.2|
|DST-Picklist (Zhang et al., 2019)||53.3|
|SST (Chen et al., 2020)||55.2|
|TripPy (Heck et al., 2020)||55.3|
|SimpleTOD (Hosseini-Asl et al., 2020)||55.7|
|Turn-By-Turn Accuracy (Cleaned Test Set)|
|Lang||Restaurant Queries with Localized Entities|
|look for 5 star restaurants that serve burgers|
|ابحث عن مطاعم 5 نجوم التي تقدم الشاورما|
|suchen sie nach 5 sterne restaurants, die maultaschen servieren|
|busque restaurantes de 5 estrellas que sirvan paella valenciana|
|به دنبال رستورانهای 5 ستاره باشید که جوجه کباب سرو میکنند|
|etsi 5 tähden ravintoloita, joissa tarjoillaan karjalanpiirakkaa|
|cerca ristoranti a 5 stelle che servono bruschette|
|poszukaj 5 gwiazdkowych restauracji, które serwują kotlet|
|köfte servis eden 5 yıldızlı restoranları arayın|
|Areas||Examples of Event-Driven Commands|
|Weather||Remind me to bring an umbrella, when rain is forecast tomorrow|
|Finance||When the Microsoft stock drops to $200, and my checking balance is greater than $2000, buy 5 shares|
|Home Automation||Email me if my car is not plugged in, when parked at home.|
|Social Media||Whenever I post my profile to Twitter, post it to Facebook|
|Security||Send images from my security camera to Dad, if motion is detected when I am not home|
|Work||Forward emails to my secretary if they are marked urgent|
See here for the paper abstracts.
We thank them for their valuable contribution.
OVAL is supported by the National Science Foundation under Grant No. 1900638, the Alfred P. Sloan Foundation under Grant No. G-2020-13938, and the Verdant Foundation. We also want to thank our partners Alpha Vantage, Baidu, Picovoice, Smartnews, and Yelp for their support.