Two Amazon papers were runners-up for best-paper awards at AAAI

机器学习

推荐系统

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"At the meeting of the Association for the Advancement of Artificial Intelligence (++[AAAI](https://www.amazon.science/conferences-and-events/aaai-2021)++) earlier this year, papers whose coauthors included Amazon researchers were runners-up for two best-paper awards.\n\nOne of the papers was a submission to the main conference: “++[Learning from eXtreme bandit feedback](https://www.amazon.science/publications/learning-from-extreme-bandit-feedback)++”, by Romain Lopez, a PhD student at the University of California, Berkeley, who was an Amazon intern when the work was done; Inderjit Dhillon, an Amazon vice president and distinguished scientist; and Michael I. Jordan, a Distinguished Amazon Scholar and professor at Berkeley, where he’s one of Lopez’s thesis advisors.\n\nIn their paper, Lopez, Dhillon, and Jordan examine the problem of how to train a machine learning system to select some action — such as ranking results of a product query — when the space of possible actions is massive and training data reflects the biases of a prior action selection strategy.\n\n![image.png](https://dev-media.amazoncloud.cn/6fc385c2ad544ffdb4e8967ea4687c2b_image.png)\n\nA visualization of the types of entities included in the the Amazon Web Services CORD-19 Search (ACS) knowledge graph and the relationships between them.\n\nThe other paper was a submission to the AAAI Workshop on Health Intelligence: “++[AAmazon Web ServicesS CORD-19 Search: A neural search engine for COVID-19 literature](https://www.amazon.science/publications/aws-cord-19-search-a-neural-search-engine-for-covid-19-literature)++”, which has 15 coauthors, all at Amazon, led by senior applied scientist Parminder Bhatia and research scientist Lan Liu, with Taha Kass-Hout as senior author.\n\nThat paper examines the array of machine learning tools that enabled ++[Amazon Web Services CORD-19 Search](https://cord19.aws/)++ (ACS), a search interface that provides natural-language search of the ++[CORD-19 database](https://allenai.org/data/cord-19)++ of COVID-related research papers assembled by the Allen Institute.\n\n\n#### **Extreme bandit feedback**\n\n\nIn their paper on bandit feedback, Lopez, Dhillon, and Jordan consider the problem of batch learning from bandit feedback in the context of extreme multilabel classification.\n\n++[Bandit problems](https://www.amazon.science/tag/bandit-problems)++ commonly arise in ++[reinforcement learning](https://www.amazon.science/tag/reinforcement-learning)++, in which a machine learning system attempts, through trial and error, to learn a policy that maximizes some reward. In the case of a recommendation system, for instance, the policy is how to select links to serve to particular customers; the reward is clicks on those links.\n\nThe classic bandit setting is online, meaning that the system can continually revise its policy in light of real-time feedback. In the offline setting, by contrast, the system’s training data all comes from transaction logs: which links did which customers see, and did they click on those links?\n\nThe problem is that the links that the customers saw were selected by an earlier policy, typically called the logging policy. The goal of batch learning from bandit feedback is to discover a new policy, which outperforms the logging policy. But how is that possible, given that we have feedback results only for the old policy?\n\n![image.png](https://dev-media.amazoncloud.cn/2c54c3d8c7ef4e3689e31bcabf85ef47_image.png)\n\nInderjit Dhillon, a vice president and Distinguished Scientist at Amazon and the Gottesman Family Centennial Professor at the University of Texas at Austin, and Michael I. Jordan, a Distinguished Amazon Scholar and the Pehong Chen Distinguished Professor at the University of California, Berkeley.\n\nCREDIT: UNIVERSITY OF TEXAS AT AUSTIN AND FLAVIA LORETO\n\nThis problem is exacerbated when there are a huge number of possible actions that the system can take. In that case, not only did customers see links selected by a suboptimal policy, but they saw only a tiny fraction of the links they might have seen.\n\nIn their paper, the researchers tackle the challenge of learning an optimal policy in this context. First, they present a theoretical analysis, describing a general approach to policy selection that converges to an optimal solution. Then they present a specific algorithm for implementing that approach. And finally, they compare the algorithm’s performance to that of four leading predecessors, using six different metrics, and find that their approach delivers the best results across the board.\n\nThe theoretical proof depends on what’s known as Rao-Blackwellization. Given any type of estimator — a procedure for estimating a quantity based on observed data — the Rao-Blackwell theorem provides a statistical method for updating the estimator that may improve its accuracy but will not diminish it. The researchers’ proof provides a way to compute the accuracy gains offered by Rao-Blackwellization in the context of extreme bandit feedback, depending on statistical properties of the transaction log data.\n\nIn practice, the researchers simply use the logging policy as the initial estimator and update it according to the Rao-Blackwell method. This yields significant increases in accuracy versus even the best-performing previous approaches — between 31% and 37% on the six metrics.\n\n\n#### **CORD-19 search**\n\n\nWith Amazon Web Services CORD-19 search (ACS), customers can query the CORD-19 database using natural language — questions such as “Is remdesivir an effective treatment for COVID-19?” or “What is the average hospitalization time for patients?”\n\nAmazon Science has discussed some of the elements described in the paper on in greater detail elsewhere: Miguel Romero Calvo explained the structure of the CORD-19 ++[knowledge graph](https://www.amazon.science/tag/knowledge-graphs)++ and the method for assembling it, and Amazon Science contributor Doug Gantenbein described the ways in which ACS leverages machine learning tools from Amazon Web Services such as ++[Amazon Kendra](https://aws.amazon.com/kendra/)++, a semantic-search and question-answering service, and ++[Comprehend Medical](https://aws.amazon.com/comprehend/medical/)++, a tool for extracting information from unstructured text that is specialized for the medical texts.\n\nIn addition to addressing these topics, the researchers’ paper also covers the ACS approach to topic modeling, or automatically grouping documents according to topic descriptors extracted from their texts, and multi-label classification, or training a machine learning model to assign new topic labels to documents on the basis of the descriptors extracted by the topic-modeling system.\n\nFinally, the researchers compare ACS to two other CORD-19 search interfaces, showing that for natural-language queries, it delivers the best results by a significant margin, while remaining competitive on more traditional keyword search. \n\nEditor's note: After publishing this post, we learned that a third Amazon paper, \"++[Targeted feedback generation for constructed-response questions](https://www.amazon.science/publications/targeted-feedback-generation-for-constructed-response-questions)++\", won the best-paper award at another AAAI 2021 workshop, the Workshop on AI Education.\n\nABOUT THE AUTHOR\n\n#### **[Larry Hardesty](https://www.amazon.science/author/larry-hardesty)**\n\nLarry Hardesty is the editor of the Amazon Science blog. Previously, he was a senior editor at MIT Technology Review and the computer science writer at the MIT News Office.\n\n","render":"At the meeting of the Association for the Advancement of Artificial Intelligence (<ins><a href=\"https://www.amazon.science/conferences-and-events/aaai-2021\" target=\"_blank\">AAAI</a></ins>) earlier this year, papers whose coauthors included Amazon researchers were runners-up for two best-paper awards.\nOne of the papers was a submission to the main conference: “<ins><a href=\"https://www.amazon.science/publications/learning-from-extreme-bandit-feedback\" target=\"_blank\">Learning from eXtreme bandit feedback</a></ins>”, by Romain Lopez, a PhD student at the University of California, Berkeley, who was an Amazon intern when the work was done; Inderjit Dhillon, an Amazon vice president and distinguished scientist; and Michael I. Jordan, a Distinguished Amazon Scholar and professor at Berkeley, where he’s one of Lopez’s thesis advisors.\nIn their paper, Lopez, Dhillon, and Jordan examine the problem of how to train a machine learning system to select some action — such as ranking results of a product query — when the space of possible actions is massive and training data reflects the biases of a prior action selection strategy.\n<img src=\"https://dev-media.amazoncloud.cn/6fc385c2ad544ffdb4e8967ea4687c2b_image.png\" alt=\"image.png\" />\nA visualization of the types of entities included in the the Amazon Web Services CORD-19 Search (ACS) knowledge graph and the relationships between them.\nThe other paper was a submission to the AAAI Workshop on Health Intelligence: “<ins><a href=\"https://www.amazon.science/publications/aws-cord-19-search-a-neural-search-engine-for-covid-19-literature\" target=\"_blank\">AAmazon Web ServicesS CORD-19 Search: A neural search engine for COVID-19 literature</a></ins>”, which has 15 coauthors, all at Amazon, led by senior applied scientist Parminder Bhatia and research scientist Lan Liu, with Taha Kass-Hout as senior author.\nThat paper examines the array of machine learning tools that enabled <ins><a href=\"https://cord19.aws/\" target=\"_blank\">Amazon Web Services CORD-19 Search</a></ins> (ACS), a search interface that provides natural-language search of the <ins><a href=\"https://allenai.org/data/cord-19\" target=\"_blank\">CORD-19 database</a></ins> of COVID-related research papers assembled by the Allen Institute.\n<h4><a id=\"Extreme_bandit_feedback_15\"></a>Extreme bandit feedback</h4>\nIn their paper on bandit feedback, Lopez, Dhillon, and Jordan consider the problem of batch learning from bandit feedback in the context of extreme multilabel classification.\n<ins><a href=\"https://www.amazon.science/tag/bandit-problems\" target=\"_blank\">Bandit problems</a></ins> commonly arise in <ins><a href=\"https://www.amazon.science/tag/reinforcement-learning\" target=\"_blank\">reinforcement learning</a></ins>, in which a machine learning system attempts, through trial and error, to learn a policy that maximizes some reward. In the case of a recommendation system, for instance, the policy is how to select links to serve to particular customers; the reward is clicks on those links.\nThe classic bandit setting is online, meaning that the system can continually revise its policy in light of real-time feedback. In the offline setting, by contrast, the system’s training data all comes from transaction logs: which links did which customers see, and did they click on those links?\nThe problem is that the links that the customers saw were selected by an earlier policy, typically called the logging policy. The goal of batch learning from bandit feedback is to discover a new policy, which outperforms the logging policy. But how is that possible, given that we have feedback results only for the old policy?\n<img src=\"https://dev-media.amazoncloud.cn/2c54c3d8c7ef4e3689e31bcabf85ef47_image.png\" alt=\"image.png\" />\nInderjit Dhillon, a vice president and Distinguished Scientist at Amazon and the Gottesman Family Centennial Professor at the University of Texas at Austin, and Michael I. Jordan, a Distinguished Amazon Scholar and the Pehong Chen Distinguished Professor at the University of California, Berkeley.\nCREDIT: UNIVERSITY OF TEXAS AT AUSTIN AND FLAVIA LORETO\nThis problem is exacerbated when there are a huge number of possible actions that the system can take. In that case, not only did customers see links selected by a suboptimal policy, but they saw only a tiny fraction of the links they might have seen.\nIn their paper, the researchers tackle the challenge of learning an optimal policy in this context. First, they present a theoretical analysis, describing a general approach to policy selection that converges to an optimal solution. Then they present a specific algorithm for implementing that approach. And finally, they compare the algorithm’s performance to that of four leading predecessors, using six different metrics, and find that their approach delivers the best results across the board.\nThe theoretical proof depends on what’s known as Rao-Blackwellization. Given any type of estimator — a procedure for estimating a quantity based on observed data — the Rao-Blackwell theorem provides a statistical method for updating the estimator that may improve its accuracy but will not diminish it. The researchers’ proof provides a way to compute the accuracy gains offered by Rao-Blackwellization in the context of extreme bandit feedback, depending on statistical properties of the transaction log data.\nIn practice, the researchers simply use the logging policy as the initial estimator and update it according to the Rao-Blackwell method. This yields significant increases in accuracy versus even the best-performing previous approaches — between 31% and 37% on the six metrics.\n<h4><a id=\"CORD19_search_41\"></a>CORD-19 search</h4>\nWith Amazon Web Services CORD-19 search (ACS), customers can query the CORD-19 database using natural language — questions such as “Is remdesivir an effective treatment for COVID-19?” or “What is the average hospitalization time for patients?”\nAmazon Science has discussed some of the elements described in the paper on in greater detail elsewhere: Miguel Romero Calvo explained the structure of the CORD-19 <ins><a href=\"https://www.amazon.science/tag/knowledge-graphs\" target=\"_blank\">knowledge graph</a></ins> and the method for assembling it, and Amazon Science contributor Doug Gantenbein described the ways in which ACS leverages machine learning tools from Amazon Web Services such as <ins><a href=\"https://aws.amazon.com/kendra/\" target=\"_blank\">Amazon Kendra</a></ins>, a semantic-search and question-answering service, and <ins><a href=\"https://aws.amazon.com/comprehend/medical/\" target=\"_blank\">Comprehend Medical</a></ins>, a tool for extracting information from unstructured text that is specialized for the medical texts.\nIn addition to addressing these topics, the researchers’ paper also covers the ACS approach to topic modeling, or automatically grouping documents according to topic descriptors extracted from their texts, and multi-label classification, or training a machine learning model to assign new topic labels to documents on the basis of the descriptors extracted by the topic-modeling system.\nFinally, the researchers compare ACS to two other CORD-19 search interfaces, showing that for natural-language queries, it delivers the best results by a significant margin, while remaining competitive on more traditional keyword search.\nEditor’s note: After publishing this post, we learned that a third Amazon paper, “<ins><a href=\"https://www.amazon.science/publications/targeted-feedback-generation-for-constructed-response-questions\" target=\"_blank\">Targeted feedback generation for constructed-response questions</a></ins>”, won the best-paper award at another AAAI 2021 workshop, the Workshop on AI Education.\nABOUT THE AUTHOR\n<h4><a id=\"Larry_Hardestyhttpswwwamazonscienceauthorlarryhardesty_56\"></a><a href=\"https://www.amazon.science/author/larry-hardesty\" target=\"_blank\">Larry Hardesty</a></h4>\nLarry Hardesty is the editor of the Amazon Science blog. Previously, he was a senior editor at MIT Technology Review and the computer science writer at the MIT News Office.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家