Amazon launches online challenge to spur research on generalizing dialogue agents

迁移学习
海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"To help promote research on generalizing task-oriented dialogue agents to new contexts, Amazon Alexa has launched a dialogue-based AI challenge on the site [EvalAI](https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview). As benchmarks for the challenge, we have also [released](https://github.com/alexa/dialoglue) a set of models that achieve state-of-the-art performance on five of the seven challenge tasks.\n\nThe challenge — which we call DialoGLUE, for Dialogue Language Understanding Evaluation — is intended to encourage research on representation-based transfer, domain adaptation, and sample-efficient task learning. Advances in these techniques should enable conversational generalizability, or the ability of dialogue agents trained on one task to adapt easily to new tasks.\n\nFor instance, if we had an agent that was trained to handle restaurant bookings, we would like to be able adapt it to handling hotel reservations with minimal retraining. Today, however, the work required to extend the functionality of a dialogue agent often scales linearly with the number of added domains.\n\nWe believe that, at least in part, this is because of a lack of standardization in the datasets and evaluation methodologies used by the dialogue research community. To support DialoGLUE, we have released a data set that aggregates seven publicly accessible dialogue data sets but standardizes their data representations, so they can be used in combination to train and evaluate a single dialogue model.\n\n![image.png](https://dev-media.amazoncloud.cn/9106a929f3c44fd18c304e57e8677a39_image.png)\n\nTwo of the responsibilities of a dialogue agent are tracking slots (such as area in this example, which takes the slot-value \"east\") and state tracking, or determining how the user’s intents change over the course of a conversation.\n\nThe annotations in the data set span four natural-language-understanding tasks. The first is intent prediction, or determining what service the user wants a voice agent to provide. The second is slot filling, or determining what entities the user has mentioned and their types. For instance, when given the instruction, “Play ‘Popstar’ by DJ Khaled”, a voice agent should recognize “Popstar” as the value of the slot Song_Name and “DJ Khaled” as the value of the slot Artist_Name.\n\nThe third task is semantic parsing, or determining the hierarchy of intents and slot values encoded in a single utterance. For instance, in the instruction “Get me tickets to Hamilton”, the intent to find the theater at which Hamilton is playing would be hierarchically nested inside the intent to buy tickets.\n\nFinally, the fourth task is dialogue state tracking, or determining how the user’s intents, and the slots and slot-values required to fulfill those intents, change over the course of a conversation.\n\nThe DialoGLUE challenge on EvalAI proposes two evaluation settings, and participants may submit entries for either or both. In the first setting — the full-data setting — the challenge is to use the complete data set to train a dialogue model that can complete the seven tasks associated with the seven source data sets. In the second setting — the few-shot setting — the challenge is to train the dialogue model on only a fraction of the available data, approximately 10%.\n\nDialoGLUE is a rolling challenge, so participants may submit their models at any time, and the leaderboard will be continuously updated.\n\nOn five of the seven tasks, our baseline models deliver state-of-the-art results, which both demonstrates the value of our aggregate data set and provides a bar for participants in the challenge to clear. Like the models, the baseline system is [publicly available](https://github.com/alexa/dialoglue).\n\nABOUT THE AUTHOR\n\n#### **[Shikib Mehri](https://www.amazon.science/author/shikib-mehri)**\n\nShikib Mehri is a PhD student at Carnegie Mellon University's Language Technologies Institute. He was an intern at Amazon when he worked on this project.\n\n#### **[Mihail Eric](https://www.amazon.science/author/mihail-eric)**\n\nMihail Eric is an applied scientist with Amazon Alexa AI.\n\n#### **[Dilek Hakkani-Tur](https://www.amazon.science/author/dilek-hakkani-tur)**\n\nDilek Hakkani-Tür is a senior principal scientist in the Amazon Alexa AI group.","render":"<p>To help promote research on generalizing task-oriented dialogue agents to new contexts, Amazon Alexa has launched a dialogue-based AI challenge on the site <a href=\\"https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview\\" target=\\"_blank\\">EvalAI</a>. As benchmarks for the challenge, we have also <a href=\\"https://github.com/alexa/dialoglue\\" target=\\"_blank\\">released</a> a set of models that achieve state-of-the-art performance on five of the seven challenge tasks.</p>\\n<p>The challenge — which we call DialoGLUE, for Dialogue Language Understanding Evaluation — is intended to encourage research on representation-based transfer, domain adaptation, and sample-efficient task learning. Advances in these techniques should enable conversational generalizability, or the ability of dialogue agents trained on one task to adapt easily to new tasks.</p>\n<p>For instance, if we had an agent that was trained to handle restaurant bookings, we would like to be able adapt it to handling hotel reservations with minimal retraining. Today, however, the work required to extend the functionality of a dialogue agent often scales linearly with the number of added domains.</p>\n<p>We believe that, at least in part, this is because of a lack of standardization in the datasets and evaluation methodologies used by the dialogue research community. To support DialoGLUE, we have released a data set that aggregates seven publicly accessible dialogue data sets but standardizes their data representations, so they can be used in combination to train and evaluate a single dialogue model.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/9106a929f3c44fd18c304e57e8677a39_image.png\\" alt=\\"image.png\\" /></p>\n<p>Two of the responsibilities of a dialogue agent are tracking slots (such as area in this example, which takes the slot-value “east”) and state tracking, or determining how the user’s intents change over the course of a conversation.</p>\n<p>The annotations in the data set span four natural-language-understanding tasks. The first is intent prediction, or determining what service the user wants a voice agent to provide. The second is slot filling, or determining what entities the user has mentioned and their types. For instance, when given the instruction, “Play ‘Popstar’ by DJ Khaled”, a voice agent should recognize “Popstar” as the value of the slot Song_Name and “DJ Khaled” as the value of the slot Artist_Name.</p>\n<p>The third task is semantic parsing, or determining the hierarchy of intents and slot values encoded in a single utterance. For instance, in the instruction “Get me tickets to Hamilton”, the intent to find the theater at which Hamilton is playing would be hierarchically nested inside the intent to buy tickets.</p>\n<p>Finally, the fourth task is dialogue state tracking, or determining how the user’s intents, and the slots and slot-values required to fulfill those intents, change over the course of a conversation.</p>\n<p>The DialoGLUE challenge on EvalAI proposes two evaluation settings, and participants may submit entries for either or both. In the first setting — the full-data setting — the challenge is to use the complete data set to train a dialogue model that can complete the seven tasks associated with the seven source data sets. In the second setting — the few-shot setting — the challenge is to train the dialogue model on only a fraction of the available data, approximately 10%.</p>\n<p>DialoGLUE is a rolling challenge, so participants may submit their models at any time, and the leaderboard will be continuously updated.</p>\n<p>On five of the seven tasks, our baseline models deliver state-of-the-art results, which both demonstrates the value of our aggregate data set and provides a bar for participants in the challenge to clear. Like the models, the baseline system is <a href=\\"https://github.com/alexa/dialoglue\\" target=\\"_blank\\">publicly available</a>.</p>\\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\\"Shikib_Mehrihttpswwwamazonscienceauthorshikibmehri_26\\"></a><strong><a href=\\"https://www.amazon.science/author/shikib-mehri\\" target=\\"_blank\\">Shikib Mehri</a></strong></h4>\n<p>Shikib Mehri is a PhD student at Carnegie Mellon University’s Language Technologies Institute. He was an intern at Amazon when he worked on this project.</p>\n<h4><a id=\\"Mihail_Erichttpswwwamazonscienceauthormihaileric_30\\"></a><strong><a href=\\"https://www.amazon.science/author/mihail-eric\\" target=\\"_blank\\">Mihail Eric</a></strong></h4>\n<p>Mihail Eric is an applied scientist with Amazon Alexa AI.</p>\n<h4><a id=\\"Dilek_HakkaniTurhttpswwwamazonscienceauthordilekhakkanitur_34\\"></a><strong><a href=\\"https://www.amazon.science/author/dilek-hakkani-tur\\" target=\\"_blank\\">Dilek Hakkani-Tur</a></strong></h4>\n<p>Dilek Hakkani-Tür is a senior principal scientist in the Amazon Alexa AI group.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭