Amazon releases dataset to help detect counterfactual phrases

自然语言处理

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Product retrieval systems, like the one in the Amazon Store, often use the text of product reviews to improve the results of queries. But such systems can be misled by counterfactual statements, which describe events that did not or cannot take place.\n\nFor example, consider the counterfactual statement “I would have bought this shirt if it were available in red”. That sentence contains the phrase “available in red”, which a naïve product retrieval system might take as evidence that, indeed, the shirt is available in red.\n\n![image.png](https://dev-media.amazoncloud.cn/9a60dab2dad5493cb92d3abc0a1bf34f_image.png)\n\nCounterfactual expressions (\"if it were available in green\") can mislead information retrieval systems that key in on phrases in natural-language reviews (\"available in green\"). A new dataset can help solve that problem.\n\nCounterfactual statements in reviews are rare, but they can lead to frustrating experiences for customers — as when, for instance, a search for “red shirt” pulls up a product whose reviews make clear that it is not available in red. To help ease that frustration, we have publicly released ++[a new dataset](https://github.com/amazon-research/amazon-multilingual-counterfactual-dataset)++ to help train machine learning models to recognize counterfactual statements.\n\nIn a ++[paper](https://www.amazon.science/publications/i-wish-i-would-have-loved-this-one-but-i-didnt-a-multilingual-dataset-for-counterfactual-detection-in-product-reviews)++ we presented at the Conference on Empirical Methods in Natural Language Processing (++[EMNLP](https://www.amazon.science/conferences-and-events/emnlp-2021)++), we explain how we assembled the dataset. We also describe the results of experiments to determine what types of machine learning models yield the best results when trained on our dataset.\n\n#### **Dataset construction**\n\nAt the time we started this project, there were no large-scale datasets that covered counterfactual statements in product reviews in multiple languages. We decided to annotate sentences selected from product reviews for three languages: English, German, and Japanese.\n\nSentences that express counterfactuals are rare in natural-language texts — only 1-2% of sentences, according to one study. Therefore, simply annotating a randomly selected set of sentences would yield a highly imbalanced dataset with a sparse training signal.\n\nCounterfactual statements can be broken into two parts: a statement about the event (if it were available in red), also referred to as the antecedent, and the consequence of the event (I would have bought this shirt), referred to as the consequent.\n\nTo identify counterfactual statements, we specified certain relationships between antecedent and consequent, in the presence of certain clue words. For instance, in the sentence “If everyone got along, it would be more enjoyable,” the consequent follows the antecedent and contains a modal verb, while the antecedent consists of a conditional conjunction followed by a past modal verb.\n\nWith the help of professional linguists for all the languages under consideration, we compiled a set of such specifications, for conjunctive normal sentences, conjunctive converse sentences, modal propositional sentences, sentences with clue words like “wished”, “hoped”, and the like.\n\nHowever, not all sentences that contain counterfactual clues express counterfactuals. For example, in the sentence “My wish came true when I got the iPhone for my birthday”, the counterfactual clue “wish” does not indicate a counterfactual condition, because the speaker truly received the iPhone. So professional linguists also reviewed the selected sentences to determine whether they truly expressed counterfactuals.\n\nSelecting sentences based on precompiled clue word lists could, however, bias the data. So we also selected sentences that do not contain clue words but are highly similar to sentences that do. As a measure of similarity, we used proximity of sentence embeddings — vector representations of the sentences — computed by a pretrained BERT model.\n\n#### **Baseline models**\n\nCounterfactual detection can be modeled as a binary classification task: given a sentence, classify it as positive if it expresses a counterfactual statement and negative otherwise.\n\nWe experimented with different methods for representing sentences, such as bag-of-words representations, static word-embedding-based representations, and contextualized word-embedding-based representations.\n\nWe also evaluated different classification algorithms, ranging from logistic regression and support vector machines to multilayer perceptrons. We found that a cross-lingual language model (XLM) based on the RoBERTa model and fine-tuned on the counterfactually annotated sentences performed best overall.\n\nTo study the relationship between our dataset and existing datasets, we trained a counterfactual detection model on our dataset and evaluated it on the public dataset for a counterfactual-detection competition, which contains counterfactual statements from news articles. Models trained on our dataset performed poorly on the competition dataset, indicating that the counterfactual statements in product reviews — the focus of our dataset — are significantly different from those in news articles.\n\nGiven that our dataset covers counterfactual statements not only in English but also in Japanese and German, we were also interested in how we can transfer a counterfactual detection model trained on one language to another. As a simple baseline, we first trained a model on English training data and then applied it to German and Japanese test data, translated into English via a machine translation system. However, this simple baseline resulted in poor performance, indicating that counterfactuals are highly language-specific, so more-principled approaches will be needed for their cross-lingual transfer.\n\nIn ongoing work, we are investigating filtration by other types of linguistic constructions, besides counterfactuals, and expanding our detection models to other languages.\n\nABOUT THE AUTHOR\n\n#### **[Danushka Bollegala](https://www.amazon.science/author/danushka-bollegala)**\n\nDanushka Bollegala is an Amazon Scholar and a professor of natural-language processing at the University of Liverpool.\n\n\n\n\n\n\n\n\n\n\n\n","render":"Product retrieval systems, like the one in the Amazon Store, often use the text of product reviews to improve the results of queries. But such systems can be misled by counterfactual statements, which describe events that did not or cannot take place.\nFor example, consider the counterfactual statement “I would have bought this shirt if it were available in red”. That sentence contains the phrase “available in red”, which a naïve product retrieval system might take as evidence that, indeed, the shirt is available in red.\n<img src=\"https://dev-media.amazoncloud.cn/9a60dab2dad5493cb92d3abc0a1bf34f_image.png\" alt=\"image.png\" />\nCounterfactual expressions (“if it were available in green”) can mislead information retrieval systems that key in on phrases in natural-language reviews (“available in green”). A new dataset can help solve that problem.\nCounterfactual statements in reviews are rare, but they can lead to frustrating experiences for customers — as when, for instance, a search for “red shirt” pulls up a product whose reviews make clear that it is not available in red. To help ease that frustration, we have publicly released <ins><a href=\"https://github.com/amazon-research/amazon-multilingual-counterfactual-dataset\" target=\"_blank\">a new dataset</a></ins> to help train machine learning models to recognize counterfactual statements.\nIn a <ins><a href=\"https://www.amazon.science/publications/i-wish-i-would-have-loved-this-one-but-i-didnt-a-multilingual-dataset-for-counterfactual-detection-in-product-reviews\" target=\"_blank\">paper</a></ins> we presented at the Conference on Empirical Methods in Natural Language Processing (<ins><a href=\"https://www.amazon.science/conferences-and-events/emnlp-2021\" target=\"_blank\">EMNLP</a></ins>), we explain how we assembled the dataset. We also describe the results of experiments to determine what types of machine learning models yield the best results when trained on our dataset.\n<h4><a id=\"Dataset_construction_12\"></a>Dataset construction</h4>\nAt the time we started this project, there were no large-scale datasets that covered counterfactual statements in product reviews in multiple languages. We decided to annotate sentences selected from product reviews for three languages: English, German, and Japanese.\nSentences that express counterfactuals are rare in natural-language texts — only 1-2% of sentences, according to one study. Therefore, simply annotating a randomly selected set of sentences would yield a highly imbalanced dataset with a sparse training signal.\nCounterfactual statements can be broken into two parts: a statement about the event (if it were available in red), also referred to as the antecedent, and the consequence of the event (I would have bought this shirt), referred to as the consequent.\nTo identify counterfactual statements, we specified certain relationships between antecedent and consequent, in the presence of certain clue words. For instance, in the sentence “If everyone got along, it would be more enjoyable,” the consequent follows the antecedent and contains a modal verb, while the antecedent consists of a conditional conjunction followed by a past modal verb.\nWith the help of professional linguists for all the languages under consideration, we compiled a set of such specifications, for conjunctive normal sentences, conjunctive converse sentences, modal propositional sentences, sentences with clue words like “wished”, “hoped”, and the like.\nHowever, not all sentences that contain counterfactual clues express counterfactuals. For example, in the sentence “My wish came true when I got the iPhone for my birthday”, the counterfactual clue “wish” does not indicate a counterfactual condition, because the speaker truly received the iPhone. So professional linguists also reviewed the selected sentences to determine whether they truly expressed counterfactuals.\nSelecting sentences based on precompiled clue word lists could, however, bias the data. So we also selected sentences that do not contain clue words but are highly similar to sentences that do. As a measure of similarity, we used proximity of sentence embeddings — vector representations of the sentences — computed by a pretrained BERT model.\n<h4><a id=\"Baseline_models_28\"></a>Baseline models</h4>\nCounterfactual detection can be modeled as a binary classification task: given a sentence, classify it as positive if it expresses a counterfactual statement and negative otherwise.\nWe experimented with different methods for representing sentences, such as bag-of-words representations, static word-embedding-based representations, and contextualized word-embedding-based representations.\nWe also evaluated different classification algorithms, ranging from logistic regression and support vector machines to multilayer perceptrons. We found that a cross-lingual language model (XLM) based on the RoBERTa model and fine-tuned on the counterfactually annotated sentences performed best overall.\nTo study the relationship between our dataset and existing datasets, we trained a counterfactual detection model on our dataset and evaluated it on the public dataset for a counterfactual-detection competition, which contains counterfactual statements from news articles. Models trained on our dataset performed poorly on the competition dataset, indicating that the counterfactual statements in product reviews — the focus of our dataset — are significantly different from those in news articles.\nGiven that our dataset covers counterfactual statements not only in English but also in Japanese and German, we were also interested in how we can transfer a counterfactual detection model trained on one language to another. As a simple baseline, we first trained a model on English training data and then applied it to German and Japanese test data, translated into English via a machine translation system. However, this simple baseline resulted in poor performance, indicating that counterfactuals are highly language-specific, so more-principled approaches will be needed for their cross-lingual transfer.\nIn ongoing work, we are investigating filtration by other types of linguistic constructions, besides counterfactuals, and expanding our detection models to other languages.\nABOUT THE AUTHOR\n<h4><a id=\"Danushka_Bollegalahttpswwwamazonscienceauthordanushkabollegala_44\"></a><a href=\"https://www.amazon.science/author/danushka-bollegala\" target=\"_blank\">Danushka Bollegala</a></h4>\nDanushka Bollegala is an Amazon Scholar and a professor of natural-language processing at the University of Liverpool.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家