Computers will transform into effective, personalized, conversational assistants for everybody, including the pre-literate and the non-literate. Commercial chatbots today are notoriously brittle as they are hardcoded to handle a few possible choices of user inputs. Recently introduced large language models (LLMs), such as GPT-3, are remarkably fluent, but they are often erroneous and prone to hallucinations. Our goal is not just to create chit-chit bots, but assistants with a purpose. We focus on studying the science in conversational agents and specifically how to:
We designed a 7-stage pipeline of subtasks implemented with few-shot prompting current LLMs. Wikichat curates information by both retrieving and summarizing paragraphs from Wikipedia and generating from an LLM followed by fact-checking. Finally, it combines the information in a draft and refine it.
One experiment shows that Wikichat significantly outperforms the LLM it is based on in terms of factuality, by 24.4% on average, and by up to 32.7% on tail topics, while matching its conversationality. (paper)
We have created the WikiWebQuestions question-answering benchmark by adapting the popular WebQuestions benchmark from Freebase to Wikidata.
WikiSP achieves 69% and 59% answer accuracy in the dev set and test set, respectively. On the dev set, GPT-3 can only answer 66% of the questions completely. We recommend that we use WikiSP to provide verifiable answers to questions if possible, and use GPT-3 to provide qualified answers otherwise. This combination provides useful answers to 97% of the questions. (paper)
We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English-Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents.
Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, code, and toolkit are released open-source. (paper)
Generative AI and Foundation Models Workshop, Stanford Computer Forum, April 12, 2023
We have created a GenieScript system that can be used for experimental prototypes. Our experience shows that the GenieScript system has made it possible for small teams to create useful prototypes quickly.
We have developed a Genie Assistant, which can perform the ten most popular skills. Genie can play songs, podcasts, radios, news, help find restaurants, answer questions, forecast weather, set timers and reminders, and control IoTs. It can control 8 different kinds of appliances, from thermostats, switches, lights, fans, doors, locks, window covers, to the vacuum cleaner; it can also control 7 different kinds of sensors, temperature, motion, illuminance, humidity, flood, ultra-light, and battery.
The First Workshop on the World Wide Voice Web, Nov 20, 2021.
State of the Open Home Workshop, Home Assistant, Dec 11, 2021.
Genie is used to create a voice interface for the Autocad Forge 3D Building Information Management software. This enables blue-collar workers to use voice to easily access digital information on the job.
AI Workshop, Stanford Computer Forum 2022, April 4-6, 2022
See here for the paper abstracts.
We thank them for their valuable contribution.
OVAL is supported by the National Science Foundation under Grant No. 1900638, the Alfred P. Sloan Foundation under Grant No. G-2020-13938, Microsoft, and the Verdant Foundation. We also want to thank our partners Alpha Vantage, Baidu, Picovoice, and Yelp for their support.