Amazon wins best-paper award at computational-linguistics conference

自然语言处理

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Earlier this month at the International Conference on Computational Linguistics (++[Coling](https://www.amazon.science/conferences-and-events/coling-2020)++), we were honored to receive the best-paper award in the industry track for our ++[paper](https://www.amazon.science/publications/leveraging-user-paraphrasing-behavior-in-dialog-systems-to-automatically-collect-annotations-for-long-tail-utterances)++ “Leveraging user paraphrasing behavior in dialog systems to automatically collect annotations for long-tail utterances.”\n\nIn the paper, we investigate how to automatically create training data for Alexa’s natural-language-understanding systems by identifying cases in which customers issue a request, then rephrase it when the initial response is unsatisfactory.\n\n![image.png](https://dev-media.amazoncloud.cn/4800147d41694722bed4c1a5b20a39aa_image.png)\n\nThe researchers' algorithm finds the mapping from a successful request to an unsuccessful paraphrase that minimizes the differences between individual words, then transfers the labels from the successful request to the unsuccessful one.\n\nCREDIT: GLYNIS CONDON\n\nInitial experiments with our system suggest that it could be particularly useful in helping Alexa learn to handle the long tail of unusually phrased requests, which are collectively numerous but individually rare.\n\nFirst, some terminology. Alexa’s natural-language-understanding (NLU) systems characterize most customer requests according to intent, slot, and slot value. In the request “Play ‘Blinding Lights’ by the Weeknd”, for instance, the intent is PlayMusic, and the slot values “Blinding Lights” and “the Weeknd” fill the slots SongName and ArtistName.\n\nWe refer to different expressions of the same intent, using the same slots and slot values, as paraphrases: “I want to hear ‘Blinding Lights’ by the Weeknd” and “Play the Weeknd song ‘Blinding Lights’” are paraphrases. The general form of a request — “Play <*SongName*> by <*ArtistName*>” — is called a carrier phrase.\n\nThe situation we investigate is one in which a customer issues a request then rephrases it — perhaps interrupting the initial response — in a way that leads to successful completion of an interaction. Our goal is to automatically map the intent and slot labels from the successful request onto the unsuccessful one and to use the annotated example that results as additional training data for Alexa’s NLU system.\n\nThis approach is similar to the one used by Alexa’s ++[self-learning model](https://www.amazon.science/blog/how-we-taught-alexa-to-correct-her-own-defects)++, which already corrects millions of errors — in either Alexa’s NLU outputs or customers’ requests — each month. But self-learning works online and leaves the underlying NLU models unchanged; our approach works offline and modifies the models. As such, the two approaches are complementary.\n\nOur system has three different modules. The first two are machine learning models, the third a hand-crafted algorithm:\n\n1. the paraphrase detector determines whether an initial request and a second request that follows it within some short, fixed time span are in fact paraphrases of each other;\n2. the friction detector determines whether the paraphrases resulted in successful or unsuccessful interactions with Alexa; and\n3. the label projection algorithm maps the slot values from the successful request onto the words of the unsuccessful one.\n\n\n#### **Paraphrase detector**\n\n\n![image.png](https://dev-media.amazoncloud.cn/616911820e264fbabf818e2adabc98ce_image.png)\n\nMachine learning models identify pairs of utterances that are paraphrases of each other and individual utterances that encounter friction when they're handled.\n\nCREDIT: GLYNIS CONDON\n\nUsually, paraphrase detection models are trained on existing data sets of labeled pairs of sentences. But those data sets are not well suited to goal-directed interactions with voice agents. So to train our paraphrase detector, we synthesize a data set containing both positive and negative examples of paraphrased utterances.\n\nTo produce each positive example, we inject the same slots and slot values into two randomly selected carrier phrases that express the same intent.\n\nTo produce negative examples, we take the intent and slots associated with a given request and vary at least one of them. For instance, the request “play ‘Blinding Lights’ by the Weeknd” has the intent and slots {PlayMusic, SongName, ArtistName}. We might vary the slot ArtistName to the slot RoomName, producing the new request “play ‘Blinding Lights’ in the kitchen”. We then pair the altered sentence with the original.\n\nTo maximize the utility of the negative-example pairs, we want to make them as difficult to distinguish as possible, so the variations are slight. We also restrict ourselves to substitutions — such as RoomName for ArtistName — that show up in real data.\n\n\n#### **Friction detector**\n\n\nAs inputs to the friction detection module, we use four types of features:\n\n1. the words of the utterance;\n2. the original intent and slot classifications from the NLU model;\n3. the confidence scores from the NLU models and the automatic speech recognizer;\n4. the status code returned by the system that handles a given intent — for instance, a report from the music player on whether it found a song title that fit the predicted value for the slot SongName.\n\nInitially, we also factored in information about the customer’s response to Alexa’s handling of a request — whether, for instance, the customer barged in to stop playback of a song. But we found that that information did not improve performance, and as it complicated the model, we dropped it.\n\nThe output of the friction model is a binary score, indicating a successful or unsuccessful interaction. Only paraphrases occurring within a short span of time, the first of which is unsuccessful and the second of which is successful, pass to our third module, the label projection algorithm.\n\n\n#### **Label projection**\n\n\nThe label projection algorithm takes the successful paraphrase and maps each of its words onto either the unsuccessful paraphrase or the null set.\n\nThe goal is to find a mapping that minimizes the distance between the paraphrases. To measure that distance, we use the Levenshtein edit distance, which counts the number of insertions, deletions, and substitutions required to turn one word into another.\n\nThe way to guarantee a minimum distance would be to consider the result of every possible mapping. But experimentally, we found that a greedy algorithm, which selects the minimum-distance mapping for the first word, then for the second, and so on, results in very little performance dropoff while saving a good deal of processing time.\n\nAfter label projection, we have a new training example, which consists of the unsuccessful utterance annotated with the slots of the successful utterance. But we don’t add that example to our new set of training data unless it is corroborated some minimum number of times by other pairs of unsuccessful and successful utterances.\n\nWe tested our approach using three different languages, German, Italian, and Hindi, with the best results in German. At the time of our experiments, the Hindi NLU models had been deployed for only six months, the Italian models for one year, and the German models for three. We believe that, as the Hindi and Italian models become more mature, the data they generate will become less noisy, and they’ll profit more from our approach, too.\n\nABOUT THE AUTHOR\n\n\n#### **[Tobias Falke](https://www.amazon.science/author/tobias-falke)**\n\n\nTobias Falke is an applied scientist in Alexa AI's Natural Understanding group.","render":"Earlier this month at the International Conference on Computational Linguistics (<ins><a href=\"https://www.amazon.science/conferences-and-events/coling-2020\" target=\"_blank\">Coling</a></ins>), we were honored to receive the best-paper award in the industry track for our <ins><a href=\"https://www.amazon.science/publications/leveraging-user-paraphrasing-behavior-in-dialog-systems-to-automatically-collect-annotations-for-long-tail-utterances\" target=\"_blank\">paper</a></ins> “Leveraging user paraphrasing behavior in dialog systems to automatically collect annotations for long-tail utterances.”\nIn the paper, we investigate how to automatically create training data for Alexa’s natural-language-understanding systems by identifying cases in which customers issue a request, then rephrase it when the initial response is unsatisfactory.\n<img src=\"https://dev-media.amazoncloud.cn/4800147d41694722bed4c1a5b20a39aa_image.png\" alt=\"image.png\" />\nThe researchers’ algorithm finds the mapping from a successful request to an unsuccessful paraphrase that minimizes the differences between individual words, then transfers the labels from the successful request to the unsuccessful one.\nCREDIT: GLYNIS CONDON\nInitial experiments with our system suggest that it could be particularly useful in helping Alexa learn to handle the long tail of unusually phrased requests, which are collectively numerous but individually rare.\nFirst, some terminology. Alexa’s natural-language-understanding (NLU) systems characterize most customer requests according to intent, slot, and slot value. In the request “Play ‘Blinding Lights’ by the Weeknd”, for instance, the intent is PlayMusic, and the slot values “Blinding Lights” and “the Weeknd” fill the slots SongName and ArtistName.\nWe refer to different expressions of the same intent, using the same slots and slot values, as paraphrases: “I want to hear ‘Blinding Lights’ by the Weeknd” and “Play the Weeknd song ‘Blinding Lights’” are paraphrases. The general form of a request — “Play <SongName> by <ArtistName>” — is called a carrier phrase.\nThe situation we investigate is one in which a customer issues a request then rephrases it — perhaps interrupting the initial response — in a way that leads to successful completion of an interaction. Our goal is to automatically map the intent and slot labels from the successful request onto the unsuccessful one and to use the annotated example that results as additional training data for Alexa’s NLU system.\nThis approach is similar to the one used by Alexa’s <ins><a href=\"https://www.amazon.science/blog/how-we-taught-alexa-to-correct-her-own-defects\" target=\"_blank\">self-learning model</a></ins>, which already corrects millions of errors — in either Alexa’s NLU outputs or customers’ requests — each month. But self-learning works online and leaves the underlying NLU models unchanged; our approach works offline and modifies the models. As such, the two approaches are complementary.\nOur system has three different modules. The first two are machine learning models, the third a hand-crafted algorithm:\n<ol>\n<li>the paraphrase detector determines whether an initial request and a second request that follows it within some short, fixed time span are in fact paraphrases of each other;</li>\n<li>the friction detector determines whether the paraphrases resulted in successful or unsuccessful interactions with Alexa; and</li>\n<li>the label projection algorithm maps the slot values from the successful request onto the words of the unsuccessful one.</li>\n</ol>\n<h4><a id=\"Paraphrase_detector_27\"></a>Paraphrase detector</h4>\n<img src=\"https://dev-media.amazoncloud.cn/616911820e264fbabf818e2adabc98ce_image.png\" alt=\"image.png\" />\nMachine learning models identify pairs of utterances that are paraphrases of each other and individual utterances that encounter friction when they’re handled.\nCREDIT: GLYNIS CONDON\nUsually, paraphrase detection models are trained on existing data sets of labeled pairs of sentences. But those data sets are not well suited to goal-directed interactions with voice agents. So to train our paraphrase detector, we synthesize a data set containing both positive and negative examples of paraphrased utterances.\nTo produce each positive example, we inject the same slots and slot values into two randomly selected carrier phrases that express the same intent.\nTo produce negative examples, we take the intent and slots associated with a given request and vary at least one of them. For instance, the request “play ‘Blinding Lights’ by the Weeknd” has the intent and slots {PlayMusic, SongName, ArtistName}. We might vary the slot ArtistName to the slot RoomName, producing the new request “play ‘Blinding Lights’ in the kitchen”. We then pair the altered sentence with the original.\nTo maximize the utility of the negative-example pairs, we want to make them as difficult to distinguish as possible, so the variations are slight. We also restrict ourselves to substitutions — such as RoomName for ArtistName — that show up in real data.\n<h4><a id=\"Friction_detector_45\"></a>Friction detector</h4>\nAs inputs to the friction detection module, we use four types of features:\n<ol>\n<li>the words of the utterance;</li>\n<li>the original intent and slot classifications from the NLU model;</li>\n<li>the confidence scores from the NLU models and the automatic speech recognizer;</li>\n<li>the status code returned by the system that handles a given intent — for instance, a report from the music player on whether it found a song title that fit the predicted value for the slot SongName.</li>\n</ol>\nInitially, we also factored in information about the customer’s response to Alexa’s handling of a request — whether, for instance, the customer barged in to stop playback of a song. But we found that that information did not improve performance, and as it complicated the model, we dropped it.\nThe output of the friction model is a binary score, indicating a successful or unsuccessful interaction. Only paraphrases occurring within a short span of time, the first of which is unsuccessful and the second of which is successful, pass to our third module, the label projection algorithm.\n<h4><a id=\"Label_projection_60\"></a>Label projection</h4>\nThe label projection algorithm takes the successful paraphrase and maps each of its words onto either the unsuccessful paraphrase or the null set.\nThe goal is to find a mapping that minimizes the distance between the paraphrases. To measure that distance, we use the Levenshtein edit distance, which counts the number of insertions, deletions, and substitutions required to turn one word into another.\nThe way to guarantee a minimum distance would be to consider the result of every possible mapping. But experimentally, we found that a greedy algorithm, which selects the minimum-distance mapping for the first word, then for the second, and so on, results in very little performance dropoff while saving a good deal of processing time.\nAfter label projection, we have a new training example, which consists of the unsuccessful utterance annotated with the slots of the successful utterance. But we don’t add that example to our new set of training data unless it is corroborated some minimum number of times by other pairs of unsuccessful and successful utterances.\nWe tested our approach using three different languages, German, Italian, and Hindi, with the best results in German. At the time of our experiments, the Hindi NLU models had been deployed for only six months, the Italian models for one year, and the German models for three. We believe that, as the Hindi and Italian models become more mature, the data they generate will become less noisy, and they’ll profit more from our approach, too.\nABOUT THE AUTHOR\n<h4><a id=\"Tobias_Falkehttpswwwamazonscienceauthortobiasfalke_76\"></a><a href=\"https://www.amazon.science/author/tobias-falke\" target=\"_blank\">Tobias Falke</a></h4>\nTobias Falke is an applied scientist in Alexa AI’s Natural Understanding group.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家