The mission of the OVAL lab is to create open advanced technology so developers can create robust, friendly, multilingual voice assistants cost-effectively. Our goal is to enable every organization to provide voice-based assistance to their services as easily as creating a website. We envision an open World Wide Voice Web where voice agents are created once and are accessible via every language and on every device (smart speakers, smart and feature phones, cars).
Commercial chatbots today are fragile, requiring developers to hardcode possible conversational paths in anticipation of what the user may say. Large language models, such as GPT-3, while versatile and general, are not grounded and cannot be controlled to provide correct and reliable information and perform actions with side effects. Our approach is to develop novel and effective voice assistant authoring tools that combine novel programming languages concepts with deep learning and large language models.
Specifically, we are creating GenieScript, a voice assistant language that lets developers create task-oriented agents by simply providing a high-level script of the conversational flow and access to their existing databases containing domain knowledge. The GenieScript system will follow the script, while automatically responding to unanticipated user inputs and help them navigate the knowledge bases. More importantly, GenieScript is sample efficient; only a small amount of annotated data is needed for few-shot learning.
The voice agents we create are the first to use contextual neural semantic parsing where a formal representation of the conversation so far is fed into a neural network to determine the semantics of the incoming sentences. The semantics is represented formally by an executable programming language called ThingTalk. ThingTalk grounds the conversation by mapping the users request to precise database queries and API executions, which is significantly much richer than the intents typically used in dialogue flow. The GenieScript system automatically synthesizes large quantities of conversational data to fine-tune large language models. By including a small sample of hand-annotated data, the developer can expect to produce a first basic agent that can understand users’ statements with about 60-70% accuracy in just a few days.
Our overall goal is to create tools that enable developers to easily create deployable agents. We aim to understand the gap between current academic research approach and practice through experimentation of real life use cases, and to devise useful methodology and tools that can be used in practice.
Starting with queries for databases one command at a time, our current research is taking on these challenges:
We are collaborating with the following groups on these projects.
Monica Lam
AI Workshop, Stanford Computer Forum 2022, April 4-6, 2022
We have now created a GenieScript system that can be used for experimental prototypes. Our experience shows that the GenieScript system has made it possible for small teams to create useful prototypes quickly.
We have developed a Genie Assistant, which can perform the ten most popular skills. Genie can play songs, podcasts, radios, news, help find restaurants, answer questions, forecast weather, set timers and reminders, and control IoTs. It can control 8 different kinds of appliances, from thermostats, switches, lights, fans, doors, locks, window covers, to the vacuum cleaner; it can also control 7 different kinds of sensors, temperature, motion, illuminance, humidity, flood, ultra-light, and battery.
The First Workshop on the World Wide Voice Web, Nov 20, 2021.
State of the Open Home Workshop, Home Assistant, Dec 11, 2021.
Genie is used to create a voice interface for a 3D Building Information Management software. This enables blue-collar workers to use voice to easily access digital information on the job.
AI Workshop, Stanford Computer Forum 2022, April 4-6, 2022
See here for the paper abstracts.
We thank them for their valuable contribution.
OVAL is supported by the National Science Foundation under Grant No. 1900638, the Alfred P. Sloan Foundation under Grant No. G-2020-13938, and the Verdant Foundation. We also want to thank our partners Alpha Vantage, Baidu, Picovoice, Smartnews, and Yelp for their support.