The brown points represent annotated data specific to a new intent the blue points represent intent-relevant data extracted from a more general data set. The figure above depicts embeddings of NLU training data, or geometrical representations of the data such that utterances with similar meanings are grouped together. So “I want to listen to Adele” is a paraphrase of “Play Adele”, but “Play Seal” is not. To make this process more efficient, Cho and her colleagues trained a neural network to identify paraphrases, which are defined as pairs of utterances that have the same domain, intent, and slot labels. On average, this reduced the recognizers’ error rates by 15%. The recognizers labeled the examples, which were then used to re-train the recognizers. Then they fed unlabeled examples extracted by the regression classifier to each intent recognizer. In an initial experiment, the researchers used sparse intent-specific data to train five different machine learning models to recognize five different intents. The relevance score for an utterance is an aggregation of the chunks’ scores, and the researchers keep only the most relevant examples. The classifier breaks every input utterance into overlapping one-word, two-word, and three-word chunks - n-grams - and assigns each chunk a score, indicating its relevance to the new intent. So they need to use training data extracted from more-general text corpora.Īs a first pass at extracting intent-relevant data from a general corpus, Cho and her colleagues use a simple n-gram-based linear logistic regression classifier, trained on whatever annotated, intent-specific data is available. But when Alexa researchers are bootstrapping a new intent, intent-specific data is scarce. Most intents have highly specific vocabularies (even when they’re large, as in the case of the PlayMusic intent), and ideally, the training data for a new intent would be weighted toward in-vocabulary utterances. In the request “Play ‘Undecided’ by Ella Fitzgerald”, for instance, the domain is Music and the intent PlayMusic, and the names “Undecided” and “Ella Fitzgerald” fill the slots SongName and ArtistName. They also identify the slot types of the entities named in the requests, or the roles those entities play in fulfilling the request. The workshop focuses on the problem of continuously improving deployed conversational-AI systems.Ĭho and her colleagues’ main-conference paper, “ Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity”, addresses an instance of that problem: teaching Alexa to recognize new “intents”.Īlexa’s NLU models classify customer requests according to domain, or the particular service that should handle a request, and intent, or the action that the customer wants executed. Three of the coauthors on the NLU paper - applied scientists Eunah Cho and Varun Kumar and applied-scientist manager Bill Campbell - are also among the five Amazon organizers of the Life-Long Learning for Spoken-Language Systems workshop, which will take place on the first day of ASRU. There, too, the researchers report algorithms for identifying data subsets - both before and after translation - that will yield a more-accurate model. The automatic-speech-recognition (ASR) paper is about machine-translating annotated data from a language that Alexa already supports to produce training data for a new language. The researchers investigate techniques for winnowing down the unannotated data, to extract examples pertinent to the new function, and then winnowing it down even further, to remove redundancies. It involves “self-training”, in which a machine learning model trained on sparse annotated data itself labels a large body of unannotated data, which in turn is used to re-train the model. The natural-language-understanding (NLU) paper is about adding new functions to a voice agent like Alexa when training data is scarce. Both papers describe automated methods for producing training data, and both describe additional algorithms for extracting just the high-value examples from that data.Įach paper, however, gravitates to a different half of the workshop’s title: one is on speech recognition, or converting an acoustic speech signal to text, and the other is on natural-language understanding, or determining a text’s meaning. This year at the IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop, Alexa researchers have two papers about training machine learning systems with minimal hand-annotated data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |