The Open Virtual Assistant Lab (OVAL) is organizing the First Open Virtual Assistant Workshop to be held on , at Stanford University, as part of the Stanford HAI Fall Conference.

The purpose of the OVAL lab is to create an ecosystem with open, federated, state-of-the-art virtual assistants to accelerate linguistic technology, make technology openly available, accelerate adoption, and enable sharing with privacy.

The OVAL lab has created a first open-source, virtual assistant that supports sharing with privacy (Almond), an open skill platform (Thingpedia), and an open-source neural model (LUInet). We are creating an industrial affiliates program to accelerate our collaboration with industry partners.

We invite interested stakeholders, open-source community members, and researchers to participate in this working meeting. Our goals are:

  1. To publicise the project's efforts to-date, which includes open standard proposals, our community-building, collaborations with industry, and our research agenda.
  2. To broaden participation in the effort, solicit new contributions, and collect feedback.
  3. To create a plan of action for the next two years.

This is an invitation-only workshop: we must limit attendance as it is a working meeting. The sessions were streamed live to the public at


Time Event
Prof. Monica Lam & Prof. James Landay, Stanford
Introduction to the Open Virtual Assistant Lab
Prof. Monica Lam, Stanford
Vision and Challenges of Open Virtual Assistants
Michael Harte, Santander Bank (UK) COO
Larry Heck, Samsung Viv Labs CEO
Jayesh Govindarajan, Salesforce AI/ML VP
Prem Natarajan, Amazon Alexa AI/NLP VP
Lightning Introductions (1 min each)
Open-Source Data and Technology
Almond: An Open Source Toolkit for Virtual Assistants Click to reveal the abstract.
Giovanni Campagna, Stanford PhD Student
Building a virtual assistant requires a lot of expertise in different areas, from voice technology, to natural language processing and dialog management, to distributed systems and access control. In this talk, I will present Almond, an end-to-end solution that allows developers and researchers to build custom functional virtual assistants. Almond leverages the open repository of APIs Thingpedia, and can run on top of existing virtual assistants such as Alexa, as well as messaging platforms like Slack. I will also present Genie, our tool that allows building new natural language understanding models with minimal cost and no machine learning expertise. Almond takes advantage of Genie to build natural language understanding without expensive manual data annotation. Furthermore, every developer can independently contribute new knowledge to Genie and Thingpedia; as the knowledge grows in the open, all assistants can benefit.
Open-Source IoT Platform
Paulus Schoutsen, Home Assistant CEO
Complex Queries on the Structured Web Click to reveal the abstract.
Silei Xu, Stanford PhD Student
Virtual assistants today require every website to submit skills individually into their proprietary repositories. The skill consists of a fixed set of supported commands and the formal representation of each command. The assistants use the contributed data to create a proprietary linguistic interface, typically using an intent classifier.
We propose an open-source toolkit, called Schema2QA, that leverages the markup found in many websites to automatically build skills. Schema2QA has several advantages: (1) Schema2QA handles compositional queries involving multiple fields automatically, such as “find the Italian restaurant around here with the most reviews”, or “what W3C employees on LinkedIn went to Oxford”; (2) Schema2QA translates natural language into executable queries on the up-to-date data from the website; (3) natural language training can be applied to one domain at a time to handle multiple websites using the same representations. We apply Schema2QA to two different domains, showing that the skills we built can answer useful queries with little manual effort. Our skills achieve an overall accuracy between 74% and 78%, and can answer questions that span three or more properties with 65% accuracy. We also show that a new domain can be supported by transferring knowledge. The open-source Schema2QA lets each website create and own its linguistic interface.
Programmatically Building & Managing Training Data with Snorkel Click to reveal the abstract.
Alex Ratner, Stanford / University of Washington
One of the key bottlenecks in building machine learning systems is creating and managing the massive training datasets that today’s models require. In this talk, I will describe our work on Snorkel, an open-source framework for building and managing training datasets, and describe three key operators for letting users build and manipulate training datasets: labeling functions, for labeling unlabeled data; transformation functions, for expressing data augmentation strategies; and slicing functions, for partitioning and structuring training datasets. These operators allow domain expert users to specify machine learning (ML) models entirely via noisy operators over training data, expressed as simple Python functions — or even via higher level NL or point-and-click interfacs — leading to applications that can be built in hours or days, rather than months or years, and that can be iteratively developed, modified, versioned, and audited. I will describe recent work on modeling the noise and imprecision inherent in these operators, and using these approaches to train ML models that solve real-world problems, including recent state-of-the-art results on benchmark tasks and real-world industry, government, and medical deployments.
TreeHacks 2020: Making the OVAL Voice Assistance Platform Accessible for Hackers
Michelle Bao, Stanford
Generative and Span-Detection approaches for Multi-domain dialogue state tracking
Dilek Hakkani-Tur, Amazon Senior Principal Scientist
Building Scalable & Privacy-Preserving Conversational AI Systems
Sujith Ravi, Google Senior Manager & Senior Staff Research Scientist
Human-Machine Co-Creation of Explainable AI Models for Data Click to reveal the abstract.
Lucian Popa, IBM Principal Researcher
While the role of humans is increasingly recognized as important by the machine learning community, the level of abstraction at which AI systems interact with human experts is often too low and far-removed from a human’s conceptual models. Ongoing work in our team aims to support human-machine co-creation with various forms of learning and human-in-the-loop techniques, while being targeted at sophisticated tasks in the lifecycle of data and knowledge curation. As part of this, I will highlight two examples of AI systems that learn explainable representations, with human in the loop, one for sentence classification and extraction, the other for entity resolution. In the first system, HEIDL, the machine-learned model is exposed to an AI engineer through high-level, explainable linguistic expressions. In HEIDL, human’s role is elevated from simply evaluating model predictions to interpreting and even updating the model logic directly via interaction with rule predicates themselves. The human-machine co-created models generalize better to unseen data as humans are able to instill their expertise by extrapolating from what has been learned by automated algorithms. Our second example, SystemER, uses active learning to interactively capture domain and application-specific knowledge from a human into a set of explainable rules for matching entities across datasets.
Digital guardian angel Click to reveal the abstract.
Marc Duranton, CEA Fellow
The General Data Protection Regulation (GDPR) of the European Union introduced stronger rules on data protection. It is intended to give people more control over their personal data and businesses benefit. However, its implementation often induced more burden to end users when browsing. For instance, a pop-up systematically asks for consent whenever a new website is visited without giving clear explanations to the user about the use of collected information. As a results, most users click on the "Accept All" button to access the destination website.
We propose to assist users via a "digital guardian angel" whose role would be to set up flags automatically according to user's desired level of protection, whatever the interface is visual or vocal. This functionality would automatically adapt predefined user privacy settings to each request for consent by exploiting (1) privacy protection mechanisms developed as part of the USEMP European project ( and (2) an ”international regulatory watch tool” developed in our laboratory that screens 20 000 web pages every day in several languages.
Controlling Fine-Grain Sharing In Natural Language Via A Virtual Assistant Click to reveal the abstract.
Giovanni Campagna, Stanford PhD Student
Until now, sharing of data between users has required going through a third party who centralizes all data, such as Facebook, or sharing username and password. Users are limited in their ability to share by what the platform provides. We propose instead to use virtual assistants to share data, accounts and devices. The virtual assistant mediates all requests. Users can express fine-grain access control in natural language; the control is translated into a formal language and enforced by the virtual assistant.
We conduct a need-finding study and find that, across 20 realistic use cases, the number of users who are willing to share doubles if fine-grain control is possible. Furthermore, we collect 220 use cases from MTurk, and find that 85% can be represented in our formal access control language. Our work was integrated in the Almond open-source virtual assistant, and we show a demo of how it can be used to help a doctor manage a large number of patient and remind them to record their blood pressure.
Distributed Multi-Layer Ledgers for Privacy-Preserving Data Sharing Click to reveal the abstract.
Ed Chang, HTC President of AI Research & Stanford
In this data-driven AI era where big data is a prerequisite for training an effective deep learning model, safeguarding user privacy is a critical issue. For medical data sharing, we have developed MedXchange, which is built on distributed multi-layer blockchains with smart contracts to ensure privacy, accountability and auditability, as well as maintaining low latency and high throughput. This presentation briefs our design to achieve these mutifaceted objectives.
User Engagement
Mimic and Rephrase: Reflective Listening in Open-Ended Dialogue. Click to reveal the abstract.
Arun Chaganty, Square AI Lead in Conversations
Reflective listening—demonstrating that you have heard your conversational partner—is key to effective communication. Expert human communicators often mimic and rephrase their conversational partner, e.g., when responding to sentimental stories or to questions they don’t know the answer to. We introduce a new task and an associated dataset wherein dialogue agents similarly mimic and rephrase a user’s request to communicate sympathy (I’m sorry to hear that) or lack of knowledge (I do not know that). We study what makes a rephrasal response good against a set of qualitative metrics. We then evaluate three models for generating responses: a syntax-aware rule- based system, a seq2seq LSTM neural mod- els with attention (S2SA), and the same neural model augmented with a copy mechanism (S2SA+C). In a human evaluation, we find that S2SA+C and the rule-based system are comparable and approach human-generated response quality. In addition, experiences with a live deployment of S2SA+C in a customer support setting suggest that this generation task is a practical contribution to real world conversational agents.
Open-world interaction between people and conversational agents
Prof. Michael Bernstein, Stanford
Speech & Multimodal Interfaces
Understanding Voice at Mozilla: 2017-2019 Click to reveal the abstract.
Jofish Kaye, Mozilla Principal Research Scientist
Mozilla’s mission is to keep the internet open and accessible to all. In this talk, I’ll describe some interesting research from our exploration into the future of open voice. I won’t get to all of it, but our portfolio includes Common Voice, an open corpus of voice data, now in 36 languages; Mozilla TTS, a competitive open source TTS tool; Mozilla SST, a competitive set of open voices; Firefox Voice, a browser-based voice assistant; and a series of studies, research projects and surveys. Together, these point to three directions for the future of open voice: a set of open, private, trustable voice applications; a set of open, private, trustable developer tools for voice; and an open speech and voice consortium to both ensure the existence of an underlying technology stack as well as provide a way to engage with increasingly likely legislation in the US, EU and elsewhere around voice applications.
Democratizing Speech Technologies Click to reveal the abstract.
Rob Chambers, Microsoft Speech Engineering Manager
Building natural interactive experiences doesn't have to be difficult. Developer-customers approach speech and language platforms with their own ideas, their own problems, and their own requirements. Yet, many speech and language APIs aren't designed with customers in mind; they start off as technical solutions to completely different scenarios, less real-world-solution oriented. Microsoft's first Speech APIs suffered from these problems. I know… I helped create them 20 years ago, and all Speech APIs since then. What have we learned? It's all about developer-scenario focus, meeting the developer-customer where they are, not where you want them to be.
Multi-modal interaction with smart appliances and smart phones
Jackie Yang, Stanford PhD Student
Conversational Agents
Conversational AI on the Edge
Zornitsa Kozareva, Google Engineering Manager
Research Progress in Task-Oriented Dialogue Click to reveal the abstract.
Chenguang Zhu, Microsoft Principal Researcher
Task-oriented dialogue systems enable humans to interact with computer systems to accomplish various tasks like booking, retail and customer service. A typical task-oriented dialogue system is made of three components: dialogue state tracking (DST), dialogue management (DM) and natural language generation (NLG). In this talk, we will introduce our work on the first and third subtasks. Our DST model significantly reduces the computational cost without degrading the performance. Our NLG model boosts the diversity in generated responses via a multitask framework.
What makes a good conversation? How controllable attributes affect human judgments Click to reveal the abstract.
Abigail See, Stanford PhD Student
A good conversation requires balance – between simplicity and detail; staying on topic and changing it; asking questions and answering them. Although dialogue agents are commonly evaluated via human judgments of overall quality, the relationship between quality and these individual factors is less well-studied. In this work, we examine two controllable neural text generation methods, conditional training and weighted decoding, in order to control four important attributes for chit-chat dialogue: repetition, specificity, response-relatedness and question-asking. We conduct a large-scale human evaluation to measure the effect of these control parameters on multi-turn interactive conversations on the PersonaChat task. We provide a detailed analysis of their relationship to high-level aspects of conversation, and show that by controlling combinations of these variables our models obtain clear improvements in human quality judgments.
Location: Packard Electrical Engineering, 350 Jane Stanford Way, Stanford


The workshop will take place at

Landau Economics Building Room A
579 Jane Stanford Way
Stanford, CA, 94305

The workshop venue can be reached using the Marguerite Y line from the Palo Alto Caltrain Station. This is a free shuttle bus that runs approximately every 20 minutes. Marguerite Schedule

The HAI Conference also has reserved parking for the workshop attendees. Please park at the Galvez Lot in the spots numbered 265-414. Please park in the spots marked "HAI ONLY". Parking is free for attendees if you park in these spots.

The reception venue is within walking distance of the workshop, and cannot be reached by car. For anyone who needs additional assistance, the Marguerite C line is also available.


The conference organizers recommend the Sheraton Hotel and the Westin Palo Alto.

Workshop Co-Chairs

Prof. Monica Lam
Prof. James Landay