LUInet (Linguistic User Interface network) is our open source neural model that can understand virtual assistant commands. It is a large neural network that translates from natural language into executable code.
Anyone can use LUInet to build new language user interfaces in their domains. By making the LUInet open, we enable companies to build their own linguistic interfaces at a low cost, without in-house machine-learning expertise. Research has shown that training for multiple domains all at once can improve the accuracy of individual domains. By accumulating contributions from experts in different domains, we can create an open LUInet that can be stronger than any proprietary model developed by one company.
When the Bitcoin price reaches $20,000 , search for a “Bitcoin” picture , and tweet it with caption “I am rich!”
price ≥ $20000
"I am rich!"
Advanced virtual assistants rely on semantic parsing to interpret the inputs coming from the user into executable actions. Semantic parsing is a machine learning task where natural language comes in, and the machine learning produces a formal representation of the input, in some formal language like SQL, Prolog or Python.
Our LUInet model targets ThingTalk as the formal representation language. ThingTalk has well defined executable semantics, which makes it easy to reason about what each command means, whether the assistant can do it or not, what the representation should be, etc.
ThingTalk has one construct:
when ⇒ get ⇒ do
The construct indicates when the command should be executed, what data it should get, and what it should do. Each clause invokes a function in Thingpedia, and then optionally computes on the result, by applying a predicate or aggregation operator. The full documentation of ThingTalk is available on the Almond website
Every command that Almond receives is translated into ThingTalk, using what is essentially a translation neural network, trained on the Thingpedia services. The resulting ThingTalk code is interpreted to execute the command.
Semantic parsing models such as LUInet are powerful, but also very data hungry. Previous work has proposed leveraging crowdsourcing to acquire data quickly and cheaply, with what was called the Overnight methodology. The core idea is that generating programs in a formal language is easy, because there is a formal grammar and typesystem that specifies exactly which programs are valid. Furthermore, both natural and formal languages are compositional: from a limited set of primitives, one can write an exponential number of possible commands.
The Overnight methodology proposes that given any program, one mechanically generates a unique canonical representation in pseudo-English, by replacing the program constructs with their English descriptions and rearranging them to fit the English grammar. This canonical representation is verbose and clunky, so it is not good to train. At the same time, it is good enough to be understood by someone who understands English but does not know programming. That someone, hired on a crowdsourcing platform like Mechanical Turk, can paraphrase the canonical representation into a truly good sentence, which then can be used to train and validate.
At the same time, unique canonical forms are not sufficient to generate good training data, and any sort of generated data over-estimates real-world accuracy. To overcome this limitations, we have developed Genie, a tool that can generate high quality semantic parsing training sets.
Genie proposes that developers data-program their natural language support. Like data programming in other contexts, the methodology begins with acquiring a high-quality validation set that is representative of real-world data. This validation set must be obtained in some way that does not bias whoever writes it, and must be manually annotated. Even better, it could be an existing source of real, unbiased data, like IFTTT is for Almond. Manual annotation is expensive, but the validation set is small (around 1500 sentences for Almond), so this is still feasible.
Then, instead of unique canonical forms, developers represent the training set using templates. These templates are associated with arbitrary semantic functions, and can decouple the composition operators in program space from the composition of natural language primitives. This allows developers to succinctly represent more ways to express the same commands; Genie then converts this representation with existing sources of data and crowdsourced paraphrases to generate a large high-quality training set. On this training set, Genie trains a model, and evaluates on the validation set. The developers can then iterate and add templates or crowdsource more paraphrases until a good validation accuracy is achieved.
First, the sentence is combined with the current context, and encoded with bidirectional recurrent and self-attention layers. Then the result is decoded with a recurrent and self-attentive auto-regressive decoder. The decoder makes use of a language model layer, which is pre-trained on a large unsupervised automatically generated set of programs. This exposes the model to programs outside of the training set. For the details, please refer to our paper.
LUInet is open source, and developed as part of the Almond platform. Both LUInet and Genie are available from our Releases page. LUInet can be used standalone, or can be used together with Thingpedia, by opening a Thingpedia Developer Account. We also welcome contributions to LUInet, whether bug fixes, new construct templates, or support for languages other than English. If you're interested in using LUInet for your own product, please check out our documentation and reach out to us on our Community Forum.