Establishing a new standard in answer selection precision

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"Practical question-answering systems often use a technique called answer selection. Given a question — say, “When was Serena Williams born?” — they perform an ordinary, keyword-based document search, then select one sentence from the retrieved documents to serve as an answer.\n\nToday, most answer selection systems are neural networks trained on questions and sets of candidate answers: given a question, they must learn to choose the right answer from among the candidates. During operation, they consider each candidate sentence independently and estimate its chance of being the correct answer.\n\nBut this approach has limitations. Imagine an article that begins “Serena Williams is an American tennis player. She was born on September 26, 1981.” If the system has learned to consider candidate answers independently, it will have no choice but to assign “September 26, 1981” a low probability, as it has no way of knowing who “she” is. Similarly, a document might mention Serena Williams by name only in its title. In this case, accurate answer selection requires a more global sense of context.\n\n![4.gif](https://dev-media.amazoncloud.cn/3c348d4fb9ec4b8ba4b594b32805dc56_4.gif)\n\nTo determine whether a given sentence from a retrieved document provides a good answer to a question, a new Amazon system looks at the sentence's context, including the sentences before and after it.\n\nCREDIT: GLYNIS CONDON\n\nIn a pair of papers we are presenting this spring, my colleagues and I investigate how to add context to answer selection systems without incurring overwhelming computational costs.\n\nWe are presenting the first paper at the European Conference on Information Retrieval (++[ECIR](https://www.amazon.science/conferences-and-events/ecir-2021)++), at the end of this month. Ivano Lauriola, an applied scientist in the Alexa AI organization, and I will ++[describe a technique](https://www.amazon.science/publications/answer-sentence-selection-using-local-and-global-context-in-transformer-models)++ for using both local and global context to significantly improve the precision of answer selection.\n\nThree weeks later, at the Conference of the European Chapter of the Association for Computational Linguistics (EACL), Rujun Han, a graduate student at the University of Southern California who joined our team as intern in summer 2020; Luca Soldaini, an applied scientist in the Alexa AI organization; and I will present ++[a more effective technique](https://www.amazon.science/publications/modeling-context-in-answer-sentence-selection-systems-on-a-latency-budget)++ for adding global context, which involves vector representations of a few selected sentences.\n\nBy combining this global-context approach with the local-context approach of the earlier paper, we demonstrate precision improvements over the state-of-the-art answer selection system of 6% and 11% on two benchmark datasets.\n\n\n#### **Local context**\n\n\nIn both papers, all of our models build upon a model ++[we presented](https://www.amazon.science/blog/on-benchmark-data-set-question-answering-system-halves-error-rate)++ at AAAI 2020, which remains the state of the art for answer selection. That model adapts a pretrained, Transformer-based language model — such as BERT — to the task of answer selection. Its inputs are concatenated question-answer pairs.\n\nIn our ECIR paper, to add local context to the basic model, we expand the input to include the sentences in the source text that precede and follow the answer candidate. Each word of the input undergoes three embeddings, or encodings as fixed-length vectors. \n\nOne is a standard word embedding, which encodes semantic content as location in the embedding space. The second is a positional embedding, which encodes the word’s location in its source sentence.\n\nThe third is a sentence embedding, which indicates which of the input sentences the word comes from. This enables the model to learn relationships between the words of the candidate answer and those of the sentences before and after it.\n\nWe also investigated a technique for capturing global context, which used a 50,000-dimension vector to record counts for each word of a 50,000-word lexicon that occurred in the source text. We use a technique called random projection to reduce that vector to 768 dimensions, the same size as the local-context vector.\n\nIn tests, we compared our system to the state-of-the-art Transformer-based system, which doesn’t factor in context, and an ensemble system that uses a separate encoder for each answer candidate and each of the sentences flanking it. The ensemble system baseline allowed us to measure how much of our model’s success depended on the inference of relationships between adjacent sentences, as opposed to simple exploitation of the additional information they contain.\n\nOn three different datasets and two different measures of precision, our model outperformed the baselines across the board. Indeed, the ensemble system fared much worse than the other two, probably because it was confused by the additional information in the contextual sentences.\n\n\n#### **Global context**\n\n\nIn our EACL paper, we consider two other methods for adding global context to our model. Both methods search through the source text for a handful of sentences — two to five worked best — that are strongly related to both the question and the candidate answer. These are then added as inputs to the model.\n\nThe two methods measure relationships between sentences in different ways. One uses n-gram overlap. That is, it breaks each sentence up into one-word, two-word, and three-word sequences and measures the overlaps between those sequences across sentences.\n\nThe other method uses contextual word embedding to determine semantic relationships between sentences, based on their proximity in the embedding space. In experiments, this is the approach that worked best.\n\nIn our experiments, we used three different architectures to explore our approach to context-aware answer selection. In all three, the inputs included both local-context information — as in our ECIR paper — and global-context information.\n\nIn the first architecture, we just concatenated the global-context sentences with the question, candidate answer, and local-context sentences. \n\nThe second architecture uses an ensemble approach. It takes two input vectors: one concatenates the question and candidate answer with the local-context sentences, and the other concatenates them with global-context sentences. The two input vectors pass to separate encoders, which produce separate vector representations for further processing. We suspected that this would increase precision, but at higher computational cost.\n\n![image.png](https://dev-media.amazoncloud.cn/171591594dc34875b39461d1c6d24800_image.png)\n\nA comparison of the ensemble method, with separate encoders for local and global context, and the multiway-attention method.\n\nThe third architecture used multiway attention to try to capture some of the gains of the ensemble architecture, but at lower cost. The multiway-attention model uses a single encoder to produce representations of all the inputs. Those representations are then fed into three separate attention blocks.\n\nThe first block forces the model to jointly examine question, answer, and local context; the second focuses on the relationship between local and global context; and the last attention block captures relations in the entire sequence. The architecture thus preserves some of the information segregation of the ensemble method.\n\nAnd indeed, in our tests, the ensemble method fared best, but the multiway-attention model was close behind, suffering drop-offs of between 0.1% and 1% on the three metrics we used for evaluation.\n\nAll three of our context-aware models, however, outperformed the state-of-the-art baseline, establishing a new standard in answer selection precision.\n\nABOUT THE AUTHOR\n\n#### **[Alessandro Moschitti](https://www.amazon.science/author/alessandro-moschitti)**\n\nAlessandro Moschitti is a principal scientist in the Alexa AI organization.\n\n\n\n\n\n","render":"<p>Practical question-answering systems often use a technique called answer selection. Given a question — say, “When was Serena Williams born?” — they perform an ordinary, keyword-based document search, then select one sentence from the retrieved documents to serve as an answer.</p>\n<p>Today, most answer selection systems are neural networks trained on questions and sets of candidate answers: given a question, they must learn to choose the right answer from among the candidates. During operation, they consider each candidate sentence independently and estimate its chance of being the correct answer.</p>\n<p>But this approach has limitations. Imagine an article that begins “Serena Williams is an American tennis player. She was born on September 26, 1981.” If the system has learned to consider candidate answers independently, it will have no choice but to assign “September 26, 1981” a low probability, as it has no way of knowing who “she” is. Similarly, a document might mention Serena Williams by name only in its title. In this case, accurate answer selection requires a more global sense of context.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/3c348d4fb9ec4b8ba4b594b32805dc56_4.gif\" alt=\"4.gif\" /></p>\n<p>To determine whether a given sentence from a retrieved document provides a good answer to a question, a new Amazon system looks at the sentence’s context, including the sentences before and after it.</p>\n<p>CREDIT: GLYNIS CONDON</p>\n<p>In a pair of papers we are presenting this spring, my colleagues and I investigate how to add context to answer selection systems without incurring overwhelming computational costs.</p>\n<p>We are presenting the first paper at the European Conference on Information Retrieval (<ins><a href=\"https://www.amazon.science/conferences-and-events/ecir-2021\" target=\"_blank\">ECIR</a></ins>), at the end of this month. Ivano Lauriola, an applied scientist in the Alexa AI organization, and I will <ins><a href=\"https://www.amazon.science/publications/answer-sentence-selection-using-local-and-global-context-in-transformer-models\" target=\"_blank\">describe a technique</a></ins> for using both local and global context to significantly improve the precision of answer selection.</p>\n<p>Three weeks later, at the Conference of the European Chapter of the Association for Computational Linguistics (EACL), Rujun Han, a graduate student at the University of Southern California who joined our team as intern in summer 2020; Luca Soldaini, an applied scientist in the Alexa AI organization; and I will present <ins><a href=\"https://www.amazon.science/publications/modeling-context-in-answer-sentence-selection-systems-on-a-latency-budget\" target=\"_blank\">a more effective technique</a></ins> for adding global context, which involves vector representations of a few selected sentences.</p>\n<p>By combining this global-context approach with the local-context approach of the earlier paper, we demonstrate precision improvements over the state-of-the-art answer selection system of 6% and 11% on two benchmark datasets.</p>\n<h4><a id=\"Local_context_21\"></a><strong>Local context</strong></h4>\n<p>In both papers, all of our models build upon a model <ins><a href=\"https://www.amazon.science/blog/on-benchmark-data-set-question-answering-system-halves-error-rate\" target=\"_blank\">we presented</a></ins> at AAAI 2020, which remains the state of the art for answer selection. That model adapts a pretrained, Transformer-based language model — such as BERT — to the task of answer selection. Its inputs are concatenated question-answer pairs.</p>\n<p>In our ECIR paper, to add local context to the basic model, we expand the input to include the sentences in the source text that precede and follow the answer candidate. Each word of the input undergoes three embeddings, or encodings as fixed-length vectors.</p>\n<p>One is a standard word embedding, which encodes semantic content as location in the embedding space. The second is a positional embedding, which encodes the word’s location in its source sentence.</p>\n<p>The third is a sentence embedding, which indicates which of the input sentences the word comes from. This enables the model to learn relationships between the words of the candidate answer and those of the sentences before and after it.</p>\n<p>We also investigated a technique for capturing global context, which used a 50,000-dimension vector to record counts for each word of a 50,000-word lexicon that occurred in the source text. We use a technique called random projection to reduce that vector to 768 dimensions, the same size as the local-context vector.</p>\n<p>In tests, we compared our system to the state-of-the-art Transformer-based system, which doesn’t factor in context, and an ensemble system that uses a separate encoder for each answer candidate and each of the sentences flanking it. The ensemble system baseline allowed us to measure how much of our model’s success depended on the inference of relationships between adjacent sentences, as opposed to simple exploitation of the additional information they contain.</p>\n<p>On three different datasets and two different measures of precision, our model outperformed the baselines across the board. Indeed, the ensemble system fared much worse than the other two, probably because it was confused by the additional information in the contextual sentences.</p>\n<h4><a id=\"Global_context_39\"></a><strong>Global context</strong></h4>\n<p>In our EACL paper, we consider two other methods for adding global context to our model. Both methods search through the source text for a handful of sentences — two to five worked best — that are strongly related to both the question and the candidate answer. These are then added as inputs to the model.</p>\n<p>The two methods measure relationships between sentences in different ways. One uses n-gram overlap. That is, it breaks each sentence up into one-word, two-word, and three-word sequences and measures the overlaps between those sequences across sentences.</p>\n<p>The other method uses contextual word embedding to determine semantic relationships between sentences, based on their proximity in the embedding space. In experiments, this is the approach that worked best.</p>\n<p>In our experiments, we used three different architectures to explore our approach to context-aware answer selection. In all three, the inputs included both local-context information — as in our ECIR paper — and global-context information.</p>\n<p>In the first architecture, we just concatenated the global-context sentences with the question, candidate answer, and local-context sentences.</p>\n<p>The second architecture uses an ensemble approach. It takes two input vectors: one concatenates the question and candidate answer with the local-context sentences, and the other concatenates them with global-context sentences. The two input vectors pass to separate encoders, which produce separate vector representations for further processing. We suspected that this would increase precision, but at higher computational cost.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/171591594dc34875b39461d1c6d24800_image.png\" alt=\"image.png\" /></p>\n<p>A comparison of the ensemble method, with separate encoders for local and global context, and the multiway-attention method.</p>\n<p>The third architecture used multiway attention to try to capture some of the gains of the ensemble architecture, but at lower cost. The multiway-attention model uses a single encoder to produce representations of all the inputs. Those representations are then fed into three separate attention blocks.</p>\n<p>The first block forces the model to jointly examine question, answer, and local context; the second focuses on the relationship between local and global context; and the last attention block captures relations in the entire sequence. The architecture thus preserves some of the information segregation of the ensemble method.</p>\n<p>And indeed, in our tests, the ensemble method fared best, but the multiway-attention model was close behind, suffering drop-offs of between 0.1% and 1% on the three metrics we used for evaluation.</p>\n<p>All three of our context-aware models, however, outperformed the state-of-the-art baseline, establishing a new standard in answer selection precision.</p>\n<p>ABOUT THE AUTHOR</p>\n<h4><a id=\"Alessandro_Moschittihttpswwwamazonscienceauthoralessandromoschitti_68\"></a><strong><a href=\"https://www.amazon.science/author/alessandro-moschitti\" target=\"_blank\">Alessandro Moschitti</a></strong></h4>\n<p>Alessandro Moschitti is a principal scientist in the Alexa AI organization.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭