We are hiring! We are looking for a couple of talented engineers to join the team. Job announcement

Interested in getting involved with the project? Let us know here!

Mission of the OVAL Lab

Voice is the next frontier in computer interfaces.

Knowledge in Voice

All knowledge in the world will be available by voice, and in all languages. This represents the next big leap in the evolution of search engines.

Dialogues as Intuitive Software Interfaces

All consumer-facing software will have a dialogue interface. To achieve this goal, we need to reduce the cost of dialogue agent development by 2 orders of magnitude. The answer is tools, tools that empower millions of natural language interface developers.

End-User Task Automation

Professionals and consumers will be able to automate their digital tasks by using a multimodal interface combining voice with graphical user interfaces.

Human-Centered AI

As we aim to fulfill the above vision, we ensure that our technology is accessible to all and that user privacy is protected.

The Open Virtual Assistant Initiative

We have launched an initiative in July to create an open-source virtual assistant infrastructure to support experimentation in research and to provide a basis of collaboration in the industry. This is made possible by a grant from Alfred P. Sloan Foundation with the goals of protecting open access to knowledge and to protect privacy. An oligopoly is emerging in virtual assistants today. With, for example, Alexa boasting of 60,000 compatible IoT devices and 100,000 third-party skills, we are witnessing the creation of proprietary voice-based webs with the assistants being the gatekeepers. Given how virtual assistants are typically given access to many personal accounts, they will gain access to massive amounts of personal data. Having a handful of companies own detailed private information worldwide, intermediate access to most digital information and services, and have outsized power to shape individual behavior are profoundly troubling.

We will be making beta releases throughout this year, with the goals of delivering in one year:

  1. Genie, an open-source, well-documented, toolkit to support Virtual Assistant 2.0 Technology. This toolkit supports natural language programming by synthesizing training data from high-level specification and avoids the need of massive manual annotation of training data.
  2. Thingpedia, a non-proprietary skill repository open to all assistants. It collects natural-language interfaces to the web and the Internet of Things. Thingpedia is open and crowdsourced like Wikipedia, which can potentially grow to be larger than any proprietary database.
  3. An open-source, privacy-preserving assistant with the top 10 most popular domains. The goal is to eventually create an alternative to Alexa and Google Assistant, similar to how Unix/Linus is an alternative to Windows, and Firefox is an alternative to Chrome. Here is a stable version of our research prototype, Almond.

Please see our roadmap. This endeavor needs the support and contributions from funding agencies, companies, researchers, developers, and individuals. Please contact us.

We are also looking for a couple of talented engineers to join the team. Job announcement

Our current partners include:

  • Alpha Vantage Open Stock API provides programmatic access to global financial market and currency data.
  • Home Assistant provides a local gateway to over 1000 different IoT devices. Almond is bundled as a voice assistant interface.
  • Smartnews, a news aggregator, is collaborating in creating a news skill.
  • Yelp is providing access to APIs to answer questions about restaurants.

Presentations & Interviews

Accomplishments to Date

System Structure

Genie Toolset

Natural Language Programming

The 1st conversational agent that learns from open-ended human feedback

The 1st context aware assistant that knows where you are

Upcoming Events & News

More

An Open Virtual Assistant 2.0 Platform

Affordability:

Virtual Assistant 2.0 improves the quality & lowers the cost of dialogue agents with automatic synthesis of high-quality training data

Key Technology and Available Software

  • Thingpedia: an open crowdsourced repository of skills with over 150 skills and 1000+ IoT devices
  • Genie Semantic Parser Generator: Automatic generation of contextual neural semantic parsers from Thingpedia entries Trained with synthesized data + 1% of traditional manual annotation cost
  • ThingTalk: The first (executable) virtual assistant programming language. A language with formal semantics to enable worldwide collaboration with extensibility, common libraries, neural models, datasets, and tools
  • Almond Assistant: the first assistant that protects privacy

Publication

The 1st Federated Virtual Assistant that Protect Privacy

Privacy

  • Almond protects privacy by allowing execution on user devices
  • A federated architecture offers interoperability & choice
  • Users can share digital assets with each other privately
  • Distributed with Home Assistant to 100,000+ users

Technology

  • Versatile access control: Natural language specification with formal ThingTalk semantics.
  • Communication protocol: Remote ThingTalk programs Allows sharing of all assets accessible to virtual assistant with privacy and security.

Publication

Examples of Access Control
Allow my daughter to watch Netflix only before 8pm.
Allow my son to purchase any household item under $10 on Amazon.
My dad can access my security camera, only when I am not home.
Whenever I am out of town, let my secretary read email messages, whose subject is marked ‘urgent’.
Allow colleagues to add GitHub issues to my to-do list.
Authors can only read those files with their names in the title.

Virtual Assistant 2.0 Methodology

Prior State of the Art

  • Today’s assistants rely heavily on annotating real user utterances with a formal representation.
    • Problems: expensive, poor coverage, error prone, privacy invading

Our Approach Reduces Data Acquisition by 2 Orders of Magnitude

  • Train question-answering and dialogue agents with
    • Mostly synthesized data from database schemas and API signatures
      • Teaches neural networks compositionally with high coverage of complex queries
    • A few shot of real data to teach the network natural language
  • Engineers can refine performance by improving
    • Domain-independent questions and dialogue models to support reuse
    • Domain-specific annotations

Publication

High-Quality & Low-Cost Question Answering Agents

Accuracy

  • 12% better than commercial assistants on crowdsourced long-tail complex restaurant questions

Affordability

  • 1% of the original manual annotation cost, for validation

Key technology

  • Generic domain-agnostic grammar templates
  • Pre-trained networks (Bert and BART)
  • A novel BERT-LSTM neural semantic parser

Available Software

  • Genie tool set, datasets for schemas in schema.org and wikidata.

Publications


Examples of Long-Tail Questions Alexa Google Siri Genie
Show me restaurants rated at least 4 stars with at least 100 reviews.
Show restaurants in San Francisco rated higher than 4.5.
What is the highest rated Chinese restaurant near Stanford?
How far is the closest 4 star restaurant?​
Find a W3C employee that went to Oxford​
Who worked for Google and lives in Palo Alto?​
Who graduated from Stanford and won a Nobel prize?​​
Who worked for at least 3 companies?​
Show me hotels with checkout time later than 12PM​​
Which hotel has a pool in this area? ​

The First Contextual Neural Dialogue Agent

Accuracy

  • Multi-domain MultiWoz dataset: first system to demonstrate 70% turn-by-turn accuracy with just 2% of real training data.

Affordability

  • Needs only 2% of the original annotated training data

Key technology

  • Training data synthesis with an abstract dialogue state machine
  • A unified contextual dialogue-state-tracking neural network is more robust than intent-classification

Available Software

  • Genie parser generator, Almond assistant.

Publications

Model Accuracy
Joint Accuracy (MultiWOZ 2.1)
TRADE (Wu et al., 2019) 45.6
SUMBT (Lee et al., 2019) 46.7
DSTQA (Zhou and Small, 2019) 51.2
DST-Picklist (Zhang et al., 2019) 53.3
SST (Chen et al., 2020) 55.2
TripPy (Heck et al., 2020) 55.3
SimpleTOD (Hosseini-Asl et al., 2020) 55.7
Turn-By-Turn Accuracy (Cleaned Test Set)
Genie 71.1

Localize QA Agents for Other Languages in a Day

Accuracy

  • 75-82% accuracy on long-tail restaurant questions.

Affordability

  • Requires no manually annotated data, only human translation for test utterances.

Key technology

  • Train with translations of synthesized English data with named entities in target language
  • New alignment-based translation method using pre-trained Marian models

Available Software

  • Genie tool set, restaurant training data set in 10 languages.

Publications

Lang Restaurant Queries with Localized Entities
US look for 5 star restaurants that serve burgers
SA ابحث عن مطاعم 5 نجوم التي تقدم الشاورما
DE suchen sie nach 5 sterne restaurants, die maultaschen servieren
ES busque restaurantes de 5 estrellas que sirvan paella valenciana
IR به دنبال رستوران‌های 5 ستاره باشید که جوجه کباب سرو می‌کنند
FI etsi 5 tähden ravintoloita, joissa tarjoillaan karjalanpiirakkaa
IT cerca ristoranti a 5 stelle che servono bruschette
JP 寿司を提供する5つ星レストランを探す
PL poszukaj 5 gwiazdkowych restauracji, które serwują kotlet
TR köfte servis eden 5 yıldızlı restoranları arayın
CN 搜索卖北京烤鸭的5星级餐厅

Event-Driven Commands in Natural Language

Capability

  • The 1st assistant to support event-driven cross-service commands.

Accuracy

  • 68% of crowdsourced event-driven commands, using no real training data

Affordability

  • Developer supplies API signatures and annotations on parameters
  • Real annotated data needed only for validation

Key technology

  • ThingTalk: a formal language for trigger-action commands
  • Compositionality: Synthesized data teach our neural network to understand unseen combinations.

Available

  • Almond assistant, Genie, Thingpedia skill repository, with 100+ popular web services & 1000 IoT devices.

Publications

Areas Examples of Event-Driven Commands
Weather Remind me to bring an umbrella, when rain is forecast tomorrow
Finance When the Microsoft stock drops to $200, and my checking balance is greater than $2000, buy 5 shares
Home Automation Email me if my car is not plugged in, when parked at home.
Social Media Whenever I post my profile to Twitter, post it to Facebook
Security Send images from my security camera to Dad, if motion is detected when I am not home
Work Forward emails to my secretary if they are marked urgent

Multimodal Virtual Assistant Commands on Mobile Devices

Novelty

  • Minimizes context switching on mobile devices with intuitive multimodal interaction.

Features

  • Query:Ask for the information verbally and point to the destination of the answer
  • Do:Point to some data on the screen and issue a command on the data
  • Keep:Point to a portion of the screen and ask to keep it on top of another app like a “post-it note”

Extensibility

  • Built on top of the Almond virtual assistant

Results

  • A user study shows that it reduces cognitive load and task completion time

Publication

VASH: End-User Web Task Automation with Demonstration

Novelty

  • User automates web tasks with voice commands as they browse.

Key Technology

  • Programming by demonstration: : Users describes filters and function applications on list by voice.
  • ThingTalk: a formal language combining web operations with control constructs

Results

  • A user study shows that Vash is easy to learn and useful for crowdsourced tasks

Publication

The First Conversational Agent Able to Learn from Open-Ended Human Feedback

Novelty

  • If conversational agents want to improve, they need to learn from human interaction
  • We introduce the first technique for learning from open-ended conversations, and an agent that interacted with 236k people online to learn new visual concepts

Key Technology

  • Interaction manifold: identification of a low-dimensional manifold of the larger action space

Available

  • Open source to be released

Publication

  • Socially Situated Artificial Intelligence: Learning to Interact and Interacting to Learn (In preparation)
    Krishna et al.

The First Smart Speaker to Know Where You Are

New Capability

  • Detect head position and orientation with the microphone array in regular smart speakers

Uses

  • Controlling IoT devices, verbal reference inference in meeting room, turn-by-turn indoor guidance

Accuracy

  • average error 0.31m, 34̊

Key technology

  • Scalable data collection with virtual reality software
  • Neural network to predict user’s head position and orientation

Publication

Publications

See here for the paper abstracts.

2020

2019

2018

2017

Team

Senior Members

PhD Students

PhD Alumni

Former Students and Collaborators

We thank them for their valuable contribution.

Acknowledgement

OVAL is supported by the National Science Foundation under Grant No. 1900638, and by the Alfred P. Sloan Foundation under Grant No. G-2020-13938.