The Open Virtual Assistant Lab seminar is a weekly event where students and researchers present their work in areas related to voice user interfaces, chatbots and virtual assistants. Topics include user interaction with natural language, chatbot-based applications, agent-to-agent distributed systems, question answering, natural language understanding and generation, and more.

The seminar is open to the Stanford community and members of the OVAL affiliate program. If you're interested to give a talk, please contact .

Mailing list: oval-seminar@lists.stanford.edu

Archive: Summer 2019

Jump to the next meeting

9/27: Organizational Lunch

Time:

Location: Gates 463A (4th floor, B wing)

Organizational lunch. Come enjoy food and sign up to give a talk during the quarter.

10/4: Neural Program Synthesis from Natural Language Specification

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
With the advancement of modern technologies, programming becomes ubiquitous not only among professional software developers, but also for general computer users. As a result, there has been an emerging interest in automatic program development. In this talk, I will mainly focus on my work on synthesizing programs from natural language descriptions, aiming at making programming systems more user-friendly. I will first discuss our work on translating natural language descriptions into If-Then programs, where If-Then programs have been adopted by several commercial websites including IFTTT, Zapier and Stringify. In the second part of my talk, I will discuss our recent work on neural-symbolic reasoning for reading comprehension. While reading comprehension has been widely considered as already solved by large-scale pre-trained language models such as BERT and XLNet, for questions that require more complex reasoning beyond text pattern matching, we find that language models themselves are insufficient. By equipping a pre-trained language model with a symbolic reasoning module that synthesizes and executes programs according to the natural language text, our Neural-symbolic Reader (NeRd) surpasses the state-of-the-art on DROP and MathQA, which are recent benchmarks that require challenging numerical reasoning, while also provides better interpretability.

Speaker: Xinyun Chen (UC Berkeley)
Xinyun Chen is a Ph.D. student at UC Berkeley, working with Prof. Dawn Song. She is also a student researcher at Google Brain, and was a research intern at Facebook AI Research. Her research lies at the intersection of deep learning, programming language, and security. Her recent work focuses on neural program synthesis and adversarial machine learning, towards tackling the grand challenges of increasing the accessibility of programming to general users, and enhancing the security and trustworthiness of machine learning models.

Food: Silei

10/11: HUBERT Untangles BERT to Improve Transfer across NLP Tasks

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
We introduce HUBERT, which combines the structured-representational power of Tensor-Product Representations (TPRs) and BERT, a pre-trained bidirectional Transformer language model. We validate the effectiveness of our model on the GLUE benchmark and HANS dataset. We also show that there is shared structure between different NLP datasets which HUBERT, but not BERT, is able to learn and leverage. Extensive transfer-learning experiments are conducted to confirm this proposition.

Speaker: Mehrad Moradshahi
Mehrad Moradshahi is a Ph.D. student in the Stanford Computer Science department advised by Prof. Monica Lam. He also was a research intern at Microsoft Research where he focused on developing Transformer-based neuro-symbolic models for NLP tasks. He has been working mainly on the AI and natural language understanding side of the Almond project since 2018.

Food: Jian

10/18: Symptom Checking for Disease Diagnosis Using Deep Reinforcement Learning

Time:

Location: Gates 463A (4th floor, B wing)

Abstract
Online symptom checkers have been deployed by sites such as WebMD and Mayo Clinic to identify possible causes and treatments for diseases based on a patient’s symptoms. Symptom checking first assesses a patient by asking a series of questions about their symptoms, then attempts to predict potential diseases. In this talk, we will explain how to formulate symptom checking as a Markov decision process and apply deep reinforcement learning to solve this problem. In order to improve the performance of symptom checking, we proposes REFUEL, a reinforcement learning method with two techniques: reward shaping and feature rebuilding. We have also cooperated with several Taiwan hospitals to launch medical services powered by our devised symptom checker.

Speaker: Jason Chou
Jason Chou received his Ph. D. from National Taiwan University in 2013. Since 2016, Dr. Chou works as a research manager at HTC Research and Healthcare and leads the deep learning team and focuses on applying various deep learning approaches to address problems in domains of computer vision and medicine. Currently, he also serves a visiting researcher at UC Berkeley.

Food: Euirim

10/25: Answering Complex Questions in the Wild

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
Open-domain question answering (open-domain QA) systems greatly improve our access to the knowledge in large text corpora, but most previous work on this topic lacks the ability to perform multi-hop reasoning, limiting how textual knowledge can actually be used. For instance, to answer "What's the Aquaman actor's next movie?", one needs to reason about the entity "Jason Momoa" instead of just comparing the question to a local context, making the task more challenging.
In this talk, I will present our recent work on enabling text-based multi-hop reasoning in open-domain question answering. First, I will talk about how we collected one of the first datasets on multi-hop QA, making it possible to train and evaluate systems to perform explainable complex reasoning among millions of Wikipedia articles. Then, I will present a QA system we developed on this dataset. Iterating between finding supporting facts and reading the retrieved context, our model outperforms all previously published approaches, many of which based on powerful pretrained neural networks like BERT. As our model generates natural language queries at each step of its retrieval, it is also readily explainable to humans, and allows for intervention when it veers off course. I will conclude by comparing our model to other recent developments on this dataset, and discussing future directions on this problem.

Speaker: Peng Qi
Peng Qi is a PhD student in Computer Science at Stanford University. His research interests revolve around building natural language processing systems that better bridge between humans and the large amount of (textual) information we are engulfed in. Specifically, he is interested in building knowledge representations, (open-domain) question answering, explainable models, and multi-lingual NLP systems. He is also interested in linguistics, and builds tools for linguistic structure analysis applicable to many languages.

11/1: No Seminar

The seminar is suspended due to the Stanford HAI Fall Conference and the First Open Virtual Assistant Workshop.

11/8: Automating Data Visualization for the Masses with Program Synthesis

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
While data visualizations play a crucial role in gaining insights from data, creating useful visualizations from a complex data set is far from an easy task. In particular, in order to create an effective visualization, the user need to (1) query a relational database to pull out data of interests, (2) understand functionality provided by existing data visualization libraries and prepare the input data to match the data shape required by the library, (3) utilize design knowledge to improve the effectiveness of the visualization. In fact, all these tasks all assume programming knowledge from the user, and this knowledge barrier prevents non-experts to create insightful data visualizations for their tasks.
Our work aims to automate the data visualization pipeline by automatically synthesizing programs that can query databases, prepare data, and plot effective visualization from user demonstration. In this talk, we will present three synthesis-based tools for data analysis and data visualization. First, we will present Scythe, a SQL query synthesizer that can synthesize SQL queries from small input-output examples provided by the user. Second, we will present Falx, a visualization by demonstration tool that automatically infers data preparation and visualization scripts by allowing user to sketch how to visualize a small subset of the input data. Finally, we will present our visualization reasoning engine Draco, which utilizes logic rules represented design guidelines to verify and optimize data visualizations. We will demonstrate how these tools can be applied to solve competing problems non-expert users posted in Vega-Lite/R/Excel forums and Stack Overflow.

Speaker: Chenglong Wang
Chenglong is a PhD student from University of Washington working with Ras Bodik and Alvin Cheung. His research interests revolve around using program synthesis to automate data analysis and data visualization.

11/15: Complex Queries on the Structured Web

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
Virtual assistants today require every website to submit skills individually into their proprietary repositories. The skill consists of a fixed set of supported commands and the formal representation of each command. The assistants use the contributed data to create a proprietary linguistic interface, typically using an intent classifier.
We propose an open-source toolkit, called Schema2QA, that leverages the Schema.org markup found in many websites to automatically build skills. Schema2QA has several advantages: (1) Schema2QA handles compositional queries involving multiple fields automatically, such as “find the Italian restaurant around here with the most reviews”, or “what W3C employees on LinkedIn went to Oxford”; (2) Schema2QA translates natural language into executable queries on the up-to-date data from the website; (3) natural language training can be applied to one domain at a time to handle multiple websites using the same Schema.org representations. We apply Schema2QA to two different domains, showing that the skills we built can answer useful queries with little manual effort. Our skills achieve an overall accuracy between 74% and 78%, and can answer questions that span three or more properties with 65% accuracy. We also show that a new domain can be supported by transferring knowledge. The open-source Schema2QA lets each website create and own its linguistic interface.

Speaker: Silei Xu

11/22: Soundr: Head position and orientation prediction using a microphone array

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
Although state-of-the-art smart speakers can hear a user’s speech, these devices do not know the user’s location and orientation, unlike a human assistant. Soundr leverages the built-in microphone array in most smart speakers and an end-to-end neural network to infer the User’s spatial location and head orientation using only their voice. The user can select and control IoT devices that don’t have their own speech recognition capability, e.g., by talking towards them to turn on or off.
To provide training data for our neural network, we collected 751 minutes of data (50x that of the best prior work) from human speakers leveraging a virtual reality headset to accurately provide head tracking ground truth. Our results achieve an average positional error of 0.31m and an orientation angle accuracy of 34.3˚. A user study to evaluate user preferences for controlling IoT appliances by talking at them found this new approach to be fast and easy to use.

Speaker: Jackie Yang
Jackie Yang is a fourth-year PhD student in Stanford HCI group, co-advised by James Landay and Monica Lam. Jackie’s work focuses on AR, VR, and multimodal interactions.

11/29: Thanksgiving Recess

12/6: Dialogues with Plato: Train your own Conversational Agents

Time:

Location: Gates 463A (4th floor, B wing)

Abstract:
In this talk I will be introducing the Plato Research Dialogue System, a recently-released flexible framework that can be used to create, train, and evaluate conversational AI agents. Plato is written in Python and has been designed around two principles: to be understandable by experts and newcomers to the field; and to support any conversational agent architecture (traditional pipeline, jointly optimised models, etc.). Using DSTC2 and MetaLWOZ as examples I will briefly show how to train components of conversational agents in Plato and if time permits I will show a running demo of the system and discuss some recent work on concurrently training two conversational agents.

Speaker: Alexandros Papangelis (Uber AI)
Alex is a Sr. Research Scientist at Uber AI, on the Conversational AI team; his interests include statistical dialogue management, natural language processing, and human-machine social interactions. Prior to Uber, he was with Toshiba Research Europe, leading the Cambridge Research Lab team on Statistical Spoken Dialogue. Before joining Toshiba, he was a post-doctoral fellow at CMU's Articulab, working with Justine Cassell on designing and developing the next generation of socially-skilled virtual agents. He received his PhD from the University of Texas at Arlington, MSc from University College London, and BSc from the University of Athens.