Research Vision

Computers will transform into effective, personalized, conversational assistants for everybody, including the pre-literate and the non-literate. Commercial chatbots today are notoriously brittle as they are hardcoded to handle a few possible choices of user inputs. Recently introduced large language models (LLMs), such as GPT-3, are remarkably fluent, but they are often erroneous and prone to hallucinations. Our goal is not just to create chit-chit bots, but assistants with a purpose. We focus on studying the science in conversational agents and specifically how to:

  • tame LLMs into robust, trustworthy, and effective conversational agents;
  • ensure that LLMs conform to human values;
  • create socially intelligent agents that can e.g. provide companionship to the elderly and emotional support to improve mental health;
  • make the technology easily used and deployed by non-AI experts.

Join Us

We invite students, researchers, and corporations to Join Us
  • to advance the state of the art in conversational agents research
  • to apply the technology to real-world use cases
  • .

Current Research Projects

  • Grounding LLMs with Wikipedia. LLMs are known to hallucinate. To improve factuality, we proposed WikiChat, a few-shot chatbot that is as conversational as LLMs, and far more factual than either LLMs and the state-of-the-art retrieve-then-generate chatbots.

    We designed a 7-stage pipeline of subtasks implemented with few-shot prompting current LLMs. Wikichat curates information by both retrieving and summarizing paragraphs from Wikipedia and generating from an LLM followed by fact-checking. Finally, it combines the information in a draft and refine it.

    One experiment shows that Wikichat significantly outperforms the LLM it is based on in terms of factuality, by 24.4% on average, and by up to 32.7% on tail topics, while matching its conversationality. (paper)

  • Grounding LLMs with Wikidata. We improve the factuality of LLMs by grounding it with Wikidata, the world's largest live knowledge base. We fine-tuned a semantic parser, WikiSP, that can translate natural language questions into Wikidata queries.

    We have created the WikiWebQuestions question-answering benchmark by adapting the popular WebQuestions benchmark from Freebase to Wikidata.

    WikiSP achieves 69% and 59% answer accuracy in the dev set and test set, respectively. On the dev set, GPT-3 can only answer 66% of the questions completely. We recommend that we use WikiSP to provide verifiable answers to questions if possible, and use GPT-3 to provide qualified answers otherwise. This combination provides useful answers to 97% of the questions. (paper)

  • Multilingual agents with Tianjin University (China), CEA (France), IIIT (Hyderabad, India), Microsoft (India), Han Yang University (Korea).
    Our multi-national team aims to make task-oriented dialogue research applicable to lower-resource languages. To keep the cost low, our approach is to train semantic parsers mainly with automatically machine-translated data. To improve the quality, we manually edit a small percentage of the data for few-shot training, evaluation, and testing. We created a toolset to facilitate the error-prone manual editing process. It improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks.

    We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents.

    Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source. (paper)

  • Social Skill Training Assistants for Individuals with Autism Spectrum Disorder with Stanford Medicine. Creating an empathetic agent that can help individuals on the Autism Disorder Spectrum or with social anxiety to improve their social skills. This is a joint research project with Prof. Lynn Koegel.
  • Conversational Agents for Building Information Management (BIM) with Stanford Civil Engineering. Voice interfaces make the massive amount of information in 3D BIM software easily accessible by blue-collar workers on the job. This is a joint research project with Prof. Martin Fischer.

Recent talk

Controlling and Grounding Large Language Models for Conversational Assistants

Monica Lam
Generative AI and Foundation Models Workshop, Stanford Computer Forum, April 12, 2023


We have created a GenieScript system that can be used for experimental prototypes. Our experience shows that the GenieScript system has made it possible for small teams to create useful prototypes quickly.

Genie Assistant

We have developed a Genie Assistant, which can perform the ten most popular skills. Genie can play songs, podcasts, radios, news, help find restaurants, answer questions, forecast weather, set timers and reminders, and control IoTs. It can control 8 different kinds of appliances, from thermostats, switches, lights, fans, doors, locks, window covers, to the vacuum cleaner; it can also control 7 different kinds of sensors, temperature, motion, illuminance, humidity, flood, ultra-light, and battery.

  1. Genie is running on the Baidu smartspeaker; it runs the full audio stack which includes acoustic echo cancellation, voice activity detection, wakeword detection (donated by PicoVoice), ducking, and speech-to-text and text-to-speech (Microsoft Azure Cognitive Services).

    The First Workshop on the World Wide Voice Web, Nov 20, 2021.

  2. Genie also runs on a Raspberry Pi. It has been distributed as the voice interface to Home Assistant, an open-source home gateway that can connect to over 1000 different IoT devices.

    State of the Open Home Workshop, Home Assistant, Dec 11, 2021.

A Building Management Assistant

Genie is used to create a voice interface for the Autocad Forge 3D Building Information Management software. This enables blue-collar workers to use voice to easily access digital information on the job.

AI Workshop, Stanford Computer Forum 2022, April 4-6, 2022

Awards and Press

Keynotes and Interviews


Open-Source Software & Datasets

  • Genie: A toolkit to synthesize, train, and build conversational virtual assistants cost-effectively.
  • ThingTalk: A extensible, executable representation language for task-oriented dialogues.
  • Genie NLP: A versatile library for any NLP task.
  • Dialogues: A library that provides a unified interface to several dialogue datasets.
  • Genie Cloud: A multi-user, kubernetes-enabled Genie runtime, with embedded NLP.
  • Genie Server: A single-user version of Genie for home servers, suitable for running on low-power devices and smart speakers.
  • Genie Devices: A repository of skills created by Genie developers.
  • Chirpy Cardinal: 2nd Prize Winner in Alexa SocialBot Challenge 2020/2021.

Upcoming Events & News



See here for the paper abstracts.









Senior Members

PhD Students

Master & Undergraduate Students

PhD Alumni

Former Students and Collaborators

We thank them for their valuable contribution.


OVAL is supported by the National Science Foundation under Grant No. 1900638, the Alfred P. Sloan Foundation under Grant No. G-2020-13938, Microsoft, and the Verdant Foundation. We also want to thank our partners Alpha Vantage, Baidu, Picovoice, and Yelp for their support.