Automatically assessing conversations with Alexa

自然语言处理
海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"More and more, interactions with Alexa involve multiturn ++[dialogues](https://www.amazon.science/tag/dialogue-management)++, which Alexa uses to fill out the details of a request or coordinate ++[multiple skills](https://www.amazon.science/blog/amazon-unveils-novel-alexa-dialog-modeling-for-natural-cross-skill-conversations)++.\n\nDialogue models, like all deployed AI models, require regular evaluation to ensure that they’re meeting customers’ needs. But evaluating a conversational interaction is a challenge; historically, it’s required human judgment, which makes evaluation slow and costly.\n\nLast week, at the Conference on Empirical Methods in Natural Language Processing (++[EMNLP](https://www.amazon.science/conferences-and-events/emnlp-2020)++), we ++[presented](https://www.amazon.science/publications/joint-turn-and-dialogue-level-user-satisfaction-estimation-on-multi-domain-conversations)++ a new neural-network-based model that attempts to estimate how customers would rate their satisfaction with dialogue interactions.\n\n![image.png](https://dev-media.amazoncloud.cn/e87168063aef493d8d76f6a5a7dc758e_image.png)\n\nA new model for estimating customer satisfaction with dialogue interactions uses a bi-LSTM, which analyzes sequences of interactions both forward and backward, and an attention layer, which determines which dialogue turns contribute most to overall satisfaction.\n\nCREDIT: FROM \"JOINT TURN AND DIALOGUE LEVEL USER SATISFACTION ESTIMATION ON MULTI-DOMAIN CONVERSATIONS\"\n\nIn tests involving three different groups of users across 28 domains (such as music, weather, and movie and restaurant booking), our model estimated customer satisfaction 27% more accurately than a prior neural-network-based model.\n\nThe new model was also 7% more accurate than an ++[earlier model](https://www.amazon.science/publications/domain-independent-turn-level-dialogue-quality-evaluation-via-user-satisfaction-estimation)++ from our group. The earlier model took advantage of features specific to Alexa’s previous dialogue manager. The new model does not, which means that it should generalize to new dialogue managers (such as ++[Alexa Conversations](https://www.amazon.science/blog/science-innovations-power-alexa-conversations-dialogue-management)++) or alternative approaches to dialogue management.\n\nThe intuitive way to train a dialogue assessment model is with sample dialogues labeled according to how satisfying they are. This has proved challenging, however: people frequently disagree in their overall assessments of the same interaction, and customer evaluations are noisy.\n\nInstead, researchers typically use training data in which each dialogue turn is rated individually; there tends to be more agreement on turn-by-turn assessments. This is the approach we took in our previous work.\n\nIn our new work, however, we train a model jointly on turn-by-turn data and overall user assessments. We use an attention mechanism to weight the contributions of the turn-by-turn scores to the final score. Those weights are learned from the data and can generalize across multiple skills and tasks.\n\n\n#### **A more-general model**\n\n\nIn our previous work — which we presented last year in two papers (++[paper 1](https://www.amazon.science/publications/domain-independent-turn-level-dialogue-quality-evaluation-via-user-satisfaction-estimation)++ | ++[paper 2](https://www.amazon.science/publications/multi-domain-conversation-quality-evaluation-via-user-satisfaction-estimation)++) — we identified 48 distinct features of the input data that a dialogue model should use to predict customer satisfaction. Some of those features were general, such as the speech recognizer’s confidence in its transcription of the input utterance. Other features, however, referred to specific dialogue acts — such as affirmation, negation, interrogation, or termination — tracked by an earlier version of Alexa’s dialogue manager.\n\nIn the new work, we keep only 12 of the most general features from the original set of 48, and we add five new ones, based on the Universal Sentence Encoder (USE). USE is a model for embedding input texts, or representing them as points in a multidimensional space, such that points representing related texts cluster together. Our new input features include the USE embeddings of customer and system utterances and measures of the similarities between them.\n\nThis feature set is much more general than the one we used in our earlier work, so it applies to a range of dialogue managers and domains. Yet a model trained using that feature set outperformed our earlier model — even when the test data included the specific dialogue acts on which the earlier model was trained.\n\nIn our paper, we first consider a model that predicts turn-by-turn ratings using a long-short-term-memory (LSTM) network. LSTMs process sequential inputs in order, so that the output corresponding to each input factors in both the inputs and outputs that preceded it. \n\nThen we present an iteration of the model that replaces the LSTM with a bi-directional LSTM (bi-LSTM), an LSTM that processes the same data both forward and backward. The bi-LSTM jointly predicts the turn-by-turn ratings and the overall dialogue rating.\n\nThe outputs of the bi-LSTM pass through the attention layer, which accords some dialogue turns greater weight than others, before passing to the final layers of the network, which perform the classification. The loss function used to evaluate the model during training is a weighted combination of turn-level ratings and the overall dialogue rating. \n\nIn ongoing work, we plan to expand the model to factor in the preferences of individual users.\n\nABOUT THE AUTHOR\n\n\n#### **[Aditya Tiwari](https://www.amazon.science/author/aditya-tiwari)**\n\n\nAditya Tiwari is a software development engineer in the Alexa AI organization.\n\n\n#### **[Josep Valls-Vargas](https://www.amazon.science/author/josep-valls-vargas)**\n\n\nJosep Valls-Vargas is an applied scientist in the Alexa AI organization.","render":"<p>More and more, interactions with Alexa involve multiturn <ins><a href=\\"https://www.amazon.science/tag/dialogue-management\\" target=\\"_blank\\">dialogues</a></ins>, which Alexa uses to fill out the details of a request or coordinate <ins><a href=\\"https://www.amazon.science/blog/amazon-unveils-novel-alexa-dialog-modeling-for-natural-cross-skill-conversations\\" target=\\"_blank\\">multiple skills</a></ins>.</p>\n<p>Dialogue models, like all deployed AI models, require regular evaluation to ensure that they’re meeting customers’ needs. But evaluating a conversational interaction is a challenge; historically, it’s required human judgment, which makes evaluation slow and costly.</p>\n<p>Last week, at the Conference on Empirical Methods in Natural Language Processing (<ins><a href=\\"https://www.amazon.science/conferences-and-events/emnlp-2020\\" target=\\"_blank\\">EMNLP</a></ins>), we <ins><a href=\\"https://www.amazon.science/publications/joint-turn-and-dialogue-level-user-satisfaction-estimation-on-multi-domain-conversations\\" target=\\"_blank\\">presented</a></ins> a new neural-network-based model that attempts to estimate how customers would rate their satisfaction with dialogue interactions.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e87168063aef493d8d76f6a5a7dc758e_image.png\\" alt=\\"image.png\\" /></p>\n<p>A new model for estimating customer satisfaction with dialogue interactions uses a bi-LSTM, which analyzes sequences of interactions both forward and backward, and an attention layer, which determines which dialogue turns contribute most to overall satisfaction.</p>\n<p>CREDIT: FROM “JOINT TURN AND DIALOGUE LEVEL USER SATISFACTION ESTIMATION ON MULTI-DOMAIN CONVERSATIONS”</p>\n<p>In tests involving three different groups of users across 28 domains (such as music, weather, and movie and restaurant booking), our model estimated customer satisfaction 27% more accurately than a prior neural-network-based model.</p>\n<p>The new model was also 7% more accurate than an <ins><a href=\\"https://www.amazon.science/publications/domain-independent-turn-level-dialogue-quality-evaluation-via-user-satisfaction-estimation\\" target=\\"_blank\\">earlier model</a></ins> from our group. The earlier model took advantage of features specific to Alexa’s previous dialogue manager. The new model does not, which means that it should generalize to new dialogue managers (such as <ins><a href=\\"https://www.amazon.science/blog/science-innovations-power-alexa-conversations-dialogue-management\\" target=\\"_blank\\">Alexa Conversations</a></ins>) or alternative approaches to dialogue management.</p>\n<p>The intuitive way to train a dialogue assessment model is with sample dialogues labeled according to how satisfying they are. This has proved challenging, however: people frequently disagree in their overall assessments of the same interaction, and customer evaluations are noisy.</p>\n<p>Instead, researchers typically use training data in which each dialogue turn is rated individually; there tends to be more agreement on turn-by-turn assessments. This is the approach we took in our previous work.</p>\n<p>In our new work, however, we train a model jointly on turn-by-turn data and overall user assessments. We use an attention mechanism to weight the contributions of the turn-by-turn scores to the final score. Those weights are learned from the data and can generalize across multiple skills and tasks.</p>\n<h4><a id=\\"A_moregeneral_model_23\\"></a><strong>A more-general model</strong></h4>\\n<p>In our previous work — which we presented last year in two papers (<ins><a href=\\"https://www.amazon.science/publications/domain-independent-turn-level-dialogue-quality-evaluation-via-user-satisfaction-estimation\\" target=\\"_blank\\">paper 1</a></ins> | <ins><a href=\\"https://www.amazon.science/publications/multi-domain-conversation-quality-evaluation-via-user-satisfaction-estimation\\" target=\\"_blank\\">paper 2</a></ins>) — we identified 48 distinct features of the input data that a dialogue model should use to predict customer satisfaction. Some of those features were general, such as the speech recognizer’s confidence in its transcription of the input utterance. Other features, however, referred to specific dialogue acts — such as affirmation, negation, interrogation, or termination — tracked by an earlier version of Alexa’s dialogue manager.</p>\n<p>In the new work, we keep only 12 of the most general features from the original set of 48, and we add five new ones, based on the Universal Sentence Encoder (USE). USE is a model for embedding input texts, or representing them as points in a multidimensional space, such that points representing related texts cluster together. Our new input features include the USE embeddings of customer and system utterances and measures of the similarities between them.</p>\n<p>This feature set is much more general than the one we used in our earlier work, so it applies to a range of dialogue managers and domains. Yet a model trained using that feature set outperformed our earlier model — even when the test data included the specific dialogue acts on which the earlier model was trained.</p>\n<p>In our paper, we first consider a model that predicts turn-by-turn ratings using a long-short-term-memory (LSTM) network. LSTMs process sequential inputs in order, so that the output corresponding to each input factors in both the inputs and outputs that preceded it.</p>\n<p>Then we present an iteration of the model that replaces the LSTM with a bi-directional LSTM (bi-LSTM), an LSTM that processes the same data both forward and backward. The bi-LSTM jointly predicts the turn-by-turn ratings and the overall dialogue rating.</p>\n<p>The outputs of the bi-LSTM pass through the attention layer, which accords some dialogue turns greater weight than others, before passing to the final layers of the network, which perform the classification. The loss function used to evaluate the model during training is a weighted combination of turn-level ratings and the overall dialogue rating.</p>\n<p>In ongoing work, we plan to expand the model to factor in the preferences of individual users.</p>\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\\"Aditya_Tiwarihttpswwwamazonscienceauthoradityatiwari_43\\"></a><strong><a href=\\"https://www.amazon.science/author/aditya-tiwari\\" target=\\"_blank\\">Aditya Tiwari</a></strong></h4>\n<p>Aditya Tiwari is a software development engineer in the Alexa AI organization.</p>\n<h4><a id=\\"Josep_VallsVargashttpswwwamazonscienceauthorjosepvallsvargas_49\\"></a><strong><a href=\\"https://www.amazon.science/author/josep-valls-vargas\\" target=\\"_blank\\">Josep Valls-Vargas</a></strong></h4>\n<p>Josep Valls-Vargas is an applied scientist in the Alexa AI organization.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭