Amazon's annual machine learning conference featured presentations from thought leaders within academia

深度学习

机器学习

自然语言处理

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"![image.png](https://dev-media.amazoncloud.cn/3faf6b530ead4243a6899f8930fa0293_image.png)\n\nThe recent Amazon Machine Learning Conference featured keynote presentations from leading academics within the field (left to right) Yoshua Bengio, Rama Chellappa, Thomas Dietterich, Mirella Lapata, and Christopher Manning. The event for Amazon scientists also featured oral and poster presentations, tutorials, and workshops.\n\nAmazon’s annual internal science conference, designed to showcase advancements in the application of machine learning across the breadth of the company’s businesses, and to foster greater collaboration within the company’s science community, occurred virtually earlier this month.\n\nThe 9th annual event featured five keynote presentations from leading academics (see below), oral and poster presentations, tutorials, and workshops.\n\nMuthu Muthukrishnan, the event’s executive sponsor, and vice president of sponsored products, Performance Advertising Technology, kicked off the event, followed by an opening keynote from Prem Natarajan, Alexa AI vice president of Natural Understanding. \n\n“This conference plays a crucial role in expanding the future of machine learning at Amazon,” Muthukrishnan said in his opening remarks, while Natarajan added that the growth of Amazon’s science community is testimony to “the use of machine learning across Amazon to deliver increasing value to our customers.”\n\nPresentations from distinguished members of the academic science community were provided by:\n\n- ++[Yoshua Bengio](https://en.wikipedia.org/wiki/Yoshua_Bengio)++, who is a Turing Award winner, and recognized as one of the world’s leading experts in artificial intelligence. He is a professor within the Department of Computer Science and Operations Research at the ++[Université de Montréal](https://en.wikipedia.org/wiki/Universit%C3%A9_de_Montr%C3%A9al)++ and the founder and scientific director of the ++[Montreal Institute for Learning Algorithms](https://en.wikipedia.org/wiki/Mila_(research_institute))++ (MILA);\n- ++[Rama Chellappa](https://engineering.jhu.edu/ece/faculty/rama-chellappa/)++, a Bloomberg Distinguished Professor in the Departments of Electrical and Computer Engineering and Biomedical Engineering and chief scientist at the Johns Hopkins Institute for Assured Autonomy;\n- ++[Thomas Dietterich](http://web.engr.oregonstate.edu/~tgd/)++, Emeritus Professor at the school of Electrical Engineering and Computer Science at Oregon State University and associate director of Policy for ++[Collaborative Robotics and Intelligent Systems](https://robotics.oregonstate.edu/)++(CoRIS), who is considered one of the pioneers in the machine learning field;\n- ++[Mirella Lapata](https://www.research.ed.ac.uk/en/persons/mirella-lapata)++, a professor within the School of Informatics at the University of Edinburgh and elected Fellow of the Royal Society of Edinburgh, whose research focuses on probabilistic learning techniques for natural language understanding and generation; and\n- ++[Christopher Manning](https://nlp.stanford.edu/manning/)++, the inaugural Thomas M. Siebel Professor in Machine Learning in the Departments of Linguistics and Computer Science at Stanford University, director of ++[Stanford’s Artificial Intelligence Laboratory](https://ai.stanford.edu/)++ (SAIL) and associate director of the ++[Stanford Human-centered Artificial Intelligence Institute]()++ (HAI). \n\nEach of the presenters graciously agreed to share their presentations publicly, and each is provided in its entirety below.\n\n\n#### **1. Yoshua Bengio: GFlowNets for Generative Active Learning**\n\n\n**++Abstract++**: We consider the following setup: a ML system can interact with an expensive oracle (the “real world”) by iteratively proposing batches of candidate experiments and then obtaining a score for each experiment (“how well did it work?”). The data from all the rounds of queries and results can be used to train a proxy for the oracle, a form of world model. The world model can then be queried (much more cheaply than the world model) in order to train (in-silico) a generative model which proposes experiments, to form the next round of queries. Systems which can do that well can be applied in interactive recommendations, to discover new drugs, new materials, control plants or learn how to reason and build a causal model. They involve many interesting ML research threads, including active learning, reinforcement learning, representation learning, exploration, meta-learning, Bayesian optimization, black-box optimization. What should be the training criterion for this generative model? Why not simply use Monte-Carlo Markov chain (MCMC) methods to generate these samples? Is it possible to bypass the mode-mixing limitation of MCMCs? How can the generative model guess where good experiments might be before having tried them? How should the world model construct a representation of its epistemic uncertainty, i.e., where it expects to predict well or not? On the path to answering these questions, we will introduce a new and exciting deep learning framework called GFlowNets which can amortize the very expensive work normally done by MCMC to convert an energy function into samples and opens the door to fascinating possibilities for probabilistic modeling, including the ability to quickly estimate marginalized probabilities and efficiently represent distributions over sets and graphs.\n\n<video src=\"https://dev-media.amazoncloud.cn/dc6e5d73cda5430c8067a0c48aaf3600_c6500b6bbbe2a8a20a164999c08b1fb7.mp4\" class=\"manvaVedio\" controls=\"controls\" style=\"width:160px;height:160px\"></video>\n\n**Yoshua Bengio AMLC presentation**\n\n\n#### **2. Rama Chellappa: Open Problems in Machine Learning**\n\n\n**++Abstract++**: In this talk, I will briefly survey my group’s recent works on building operational systems for face recognition and action recognition using deep learning. While reasonable success can be claimed, many open problems still remain to be addressed. These include bias detection and mitigation, domain adaptation and generalization, learning from unlabeled data, handling adversarial attacks, and selecting the best subsets of training data in mini-batch learning. Some of our recent works addressing these challenges will be summarized.\n\n<video src=\"https://dev-media.amazoncloud.cn/415510ceca654475873c86f3c2da2d3a_7ce9717719c1c8af4cf12a657841bc98.mp4\" class=\"manvaVedio\" controls=\"controls\" style=\"width:160px;height:160px\"></video>\n\n**Rama Chellappa AMLC presentation**\n\n\n#### **3. Thomas Dietterich: Anomaly Detection for OOD and Novel Category Detection**\n\n\n**++Abstract++**: Every deployed learning system should be accompanied by a competence model that can detect when new queries fall outside its region of competence. This presentation will discuss the application of anomaly detection to provide a competence model for object classification in deep learning. We consider two threats to competence: queries that are out-of-distribution and queries that correspond to novel classes. The talk will review the four main strategies for anomaly detection and then survey some of the many recently-published methods for anomaly detection in deep learning. The central challenge is to learn a representation that assigns distinct representations to the anomalies. The talk will conclude with a discussion of how to set the anomaly detection threshold to achieve a desired missed-alarm rate without relying on labeled anomaly data.\n\n<video src=\"https://dev-media.amazoncloud.cn/a9125197f19b400680470a308c419c28_58cc8f7d8dd9f7c2f3f0c91d204eb415.mp4\" class=\"manvaVedio\" controls=\"controls\" style=\"width:160px;height:160px\"></video>\n\n**Thomas Dietterich AMLC presentation**\n\n\n#### **4. Mirella Lapata: Automatic Movie Analysis and Summarization via Turning Point**\n\n\n**++Abstract++**: Movie analysis is an umbrella term for many tasks aiming to automatically interpret, extract, and summarize the content of a movie. Potential applications include generating shorter versions of scripts to help with the decision-making process in a production company, enhancing movie recommendation engines, and notably generating movie previews.\n\nIn this talk I will introduce the task of turning point identification as a means of analyzing movie content. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie: they define its plot structure, determine its progression and segment it into thematic units. I will argue that turning points and the segmentation they provide can facilitate the analysis of long, complex narratives, such as screenplays. I will further formalize the generation of a shorter version of a movie as the problem of identifying scenes with turning points and present a graph neural network model for this task based on linguistic and audiovisual information. Finally, I will discuss why the representation of screenplays as (sparse) graphs offers interpretability and exposes the morphology of different movie genres.\n\n<video src=\"https://dev-media.amazoncloud.cn/f43336683d6640cab393c87410062cb3_cdc4b7b71f89d600082bfd72a139757e.mp4\" class=\"manvaVedio\" controls=\"controls\" style=\"width:160px;height:160px\"></video>\n\n**Mirella Lapata AMLC presentation**\n\n\n#### **5. Christopher Manning: From Large Pre-Trained Language Models Discovering Linguistic Structure towards Foundation Models**\n\n\n**++Abstract++**: I will first briefly outline the recent sea change in NLP with the rise of large pre-trained transformer language models, such as BERT, and the effectiveness of these models on NLP tasks. I will then focus in on two particular aspects on which I have worked. First, I will show how, despite only using a simple self-supervision task, BERT-like models not only learn word associations but act as linguistic structure discovery devices, capturing such things as human language syntax and pronominal coreference. Secondly, I will emphasize how recent progress has been bought at enormous computational cost and explore the ELECTRA model, in which an alternative discriminative learning method allows building highly effective neural word representations with considerably less computation. Finally, I will introduce how large pre-trained models are being extended into a larger class of Foundation Models, a direction with much promise but also concomitant risks, and how we hoping to contribute to their exploration at Stanford.\n\n<video src=\"https://dev-media.amazoncloud.cn/09f6328ab90b4f039448bcd3cc9db6d7_5054da12591d5b2a1f7696136ae9408c.mp4\" class=\"manvaVedio\" controls=\"controls\" style=\"width:160px;height:160px\"></video>\n\n**Christopher Manning AMLC presentation**\n\nABOUT THE AUTHOR\n\n#### **Staff writer**","render":"<img src=\"https://dev-media.amazoncloud.cn/3faf6b530ead4243a6899f8930fa0293_image.png\" alt=\"image.png\" />\nThe recent Amazon Machine Learning Conference featured keynote presentations from leading academics within the field (left to right) Yoshua Bengio, Rama Chellappa, Thomas Dietterich, Mirella Lapata, and Christopher Manning. The event for Amazon scientists also featured oral and poster presentations, tutorials, and workshops.\nAmazon’s annual internal science conference, designed to showcase advancements in the application of machine learning across the breadth of the company’s businesses, and to foster greater collaboration within the company’s science community, occurred virtually earlier this month.\nThe 9th annual event featured five keynote presentations from leading academics (see below), oral and poster presentations, tutorials, and workshops.\nMuthu Muthukrishnan, the event’s executive sponsor, and vice president of sponsored products, Performance Advertising Technology, kicked off the event, followed by an opening keynote from Prem Natarajan, Alexa AI vice president of Natural Understanding.\n“This conference plays a crucial role in expanding the future of machine learning at Amazon,” Muthukrishnan said in his opening remarks, while Natarajan added that the growth of Amazon’s science community is testimony to “the use of machine learning across Amazon to deliver increasing value to our customers.”\nPresentations from distinguished members of the academic science community were provided by:\n<ul>\n<li><ins><a href=\"https://en.wikipedia.org/wiki/Yoshua_Bengio\" target=\"_blank\">Yoshua Bengio</a></ins>, who is a Turing Award winner, and recognized as one of the world’s leading experts in artificial intelligence. He is a professor within the Department of Computer Science and Operations Research at the <ins><a href=\"https://en.wikipedia.org/wiki/Universit%C3%A9_de_Montr%C3%A9al\" target=\"_blank\">Université de Montréal</a></ins> and the founder and scientific director of the <ins><a href=\"https://en.wikipedia.org/wiki/Mila_(research_institute)\" target=\"_blank\">Montreal Institute for Learning Algorithms</a></ins> (MILA);</li>\n<li><ins><a href=\"https://engineering.jhu.edu/ece/faculty/rama-chellappa/\" target=\"_blank\">Rama Chellappa</a></ins>, a Bloomberg Distinguished Professor in the Departments of Electrical and Computer Engineering and Biomedical Engineering and chief scientist at the Johns Hopkins Institute for Assured Autonomy;</li>\n<li><ins><a href=\"http://web.engr.oregonstate.edu/~tgd/\" target=\"_blank\">Thomas Dietterich</a></ins>, Emeritus Professor at the school of Electrical Engineering and Computer Science at Oregon State University and associate director of Policy for <ins><a href=\"https://robotics.oregonstate.edu/\" target=\"_blank\">Collaborative Robotics and Intelligent Systems</a></ins>(CoRIS), who is considered one of the pioneers in the machine learning field;</li>\n<li><ins><a href=\"https://www.research.ed.ac.uk/en/persons/mirella-lapata\" target=\"_blank\">Mirella Lapata</a></ins>, a professor within the School of Informatics at the University of Edinburgh and elected Fellow of the Royal Society of Edinburgh, whose research focuses on probabilistic learning techniques for natural language understanding and generation; and</li>\n<li><ins><a href=\"https://nlp.stanford.edu/manning/\" target=\"_blank\">Christopher Manning</a></ins>, the inaugural Thomas M. Siebel Professor in Machine Learning in the Departments of Linguistics and Computer Science at Stanford University, director of <ins><a href=\"https://ai.stanford.edu/\" target=\"_blank\">Stanford’s Artificial Intelligence Laboratory</a></ins> (SAIL) and associate director of the <ins><a href=\"\" target=\"_blank\">Stanford Human-centered Artificial Intelligence Institute</a></ins> (HAI).</li>\n</ul>\nEach of the presenters graciously agreed to share their presentations publicly, and each is provided in its entirety below.\n<h4><a id=\"1_Yoshua_Bengio_GFlowNets_for_Generative_Active_Learning_23\"></a>1. Yoshua Bengio: GFlowNets for Generative Active Learning</h4>\n<ins>Abstract</ins>: We consider the following setup: a ML system can interact with an expensive oracle (the “real world”) by iteratively proposing batches of candidate experiments and then obtaining a score for each experiment (“how well did it work?”). The data from all the rounds of queries and results can be used to train a proxy for the oracle, a form of world model. The world model can then be queried (much more cheaply than the world model) in order to train (in-silico) a generative model which proposes experiments, to form the next round of queries. Systems which can do that well can be applied in interactive recommendations, to discover new drugs, new materials, control plants or learn how to reason and build a causal model. They involve many interesting ML research threads, including active learning, reinforcement learning, representation learning, exploration, meta-learning, Bayesian optimization, black-box optimization. What should be the training criterion for this generative model? Why not simply use Monte-Carlo Markov chain (MCMC) methods to generate these samples? Is it possible to bypass the mode-mixing limitation of MCMCs? How can the generative model guess where good experiments might be before having tried them? How should the world model construct a representation of its epistemic uncertainty, i.e., where it expects to predict well or not? On the path to answering these questions, we will introduce a new and exciting deep learning framework called GFlowNets which can amortize the very expensive work normally done by MCMC to convert an energy function into samples and opens the door to fascinating possibilities for probabilistic modeling, including the ability to quickly estimate marginalized probabilities and efficiently represent distributions over sets and graphs.\n<video src=\"https://dev-media.amazoncloud.cn/dc6e5d73cda5430c8067a0c48aaf3600_c6500b6bbbe2a8a20a164999c08b1fb7.mp4\" controls=\"controls\"></video>\nYoshua Bengio AMLC presentation\n<h4><a id=\"2_Rama_Chellappa_Open_Problems_in_Machine_Learning_33\"></a>2. Rama Chellappa: Open Problems in Machine Learning</h4>\n<ins>Abstract</ins>: In this talk, I will briefly survey my group’s recent works on building operational systems for face recognition and action recognition using deep learning. While reasonable success can be claimed, many open problems still remain to be addressed. These include bias detection and mitigation, domain adaptation and generalization, learning from unlabeled data, handling adversarial attacks, and selecting the best subsets of training data in mini-batch learning. Some of our recent works addressing these challenges will be summarized.\n<video src=\"https://dev-media.amazoncloud.cn/415510ceca654475873c86f3c2da2d3a_7ce9717719c1c8af4cf12a657841bc98.mp4\" controls=\"controls\"></video>\nRama Chellappa AMLC presentation\n<h4><a id=\"3_Thomas_Dietterich_Anomaly_Detection_for_OOD_and_Novel_Category_Detection_43\"></a>3. Thomas Dietterich: Anomaly Detection for OOD and Novel Category Detection</h4>\n<ins>Abstract</ins>: Every deployed learning system should be accompanied by a competence model that can detect when new queries fall outside its region of competence. This presentation will discuss the application of anomaly detection to provide a competence model for object classification in deep learning. We consider two threats to competence: queries that are out-of-distribution and queries that correspond to novel classes. The talk will review the four main strategies for anomaly detection and then survey some of the many recently-published methods for anomaly detection in deep learning. The central challenge is to learn a representation that assigns distinct representations to the anomalies. The talk will conclude with a discussion of how to set the anomaly detection threshold to achieve a desired missed-alarm rate without relying on labeled anomaly data.\n<video src=\"https://dev-media.amazoncloud.cn/a9125197f19b400680470a308c419c28_58cc8f7d8dd9f7c2f3f0c91d204eb415.mp4\" controls=\"controls\"></video>\nThomas Dietterich AMLC presentation\n<h4><a id=\"4_Mirella_Lapata_Automatic_Movie_Analysis_and_Summarization_via_Turning_Point_53\"></a>4. Mirella Lapata: Automatic Movie Analysis and Summarization via Turning Point</h4>\n<ins>Abstract</ins>: Movie analysis is an umbrella term for many tasks aiming to automatically interpret, extract, and summarize the content of a movie. Potential applications include generating shorter versions of scripts to help with the decision-making process in a production company, enhancing movie recommendation engines, and notably generating movie previews.\nIn this talk I will introduce the task of turning point identification as a means of analyzing movie content. According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a movie: they define its plot structure, determine its progression and segment it into thematic units. I will argue that turning points and the segmentation they provide can facilitate the analysis of long, complex narratives, such as screenplays. I will further formalize the generation of a shorter version of a movie as the problem of identifying scenes with turning points and present a graph neural network model for this task based on linguistic and audiovisual information. Finally, I will discuss why the representation of screenplays as (sparse) graphs offers interpretability and exposes the morphology of different movie genres.\n<video src=\"https://dev-media.amazoncloud.cn/f43336683d6640cab393c87410062cb3_cdc4b7b71f89d600082bfd72a139757e.mp4\" controls=\"controls\"></video>\nMirella Lapata AMLC presentation\n<h4><a id=\"5_Christopher_Manning_From_Large_PreTrained_Language_Models_Discovering_Linguistic_Structure_towards_Foundation_Models_65\"></a>5. Christopher Manning: From Large Pre-Trained Language Models Discovering Linguistic Structure towards Foundation Models</h4>\n<ins>Abstract</ins>: I will first briefly outline the recent sea change in NLP with the rise of large pre-trained transformer language models, such as BERT, and the effectiveness of these models on NLP tasks. I will then focus in on two particular aspects on which I have worked. First, I will show how, despite only using a simple self-supervision task, BERT-like models not only learn word associations but act as linguistic structure discovery devices, capturing such things as human language syntax and pronominal coreference. Secondly, I will emphasize how recent progress has been bought at enormous computational cost and explore the ELECTRA model, in which an alternative discriminative learning method allows building highly effective neural word representations with considerably less computation. Finally, I will introduce how large pre-trained models are being extended into a larger class of Foundation Models, a direction with much promise but also concomitant risks, and how we hoping to contribute to their exploration at Stanford.\n<video src=\"https://dev-media.amazoncloud.cn/09f6328ab90b4f039448bcd3cc9db6d7_5054da12591d5b2a1f7696136ae9408c.mp4\" controls=\"controls\"></video>\nChristopher Manning AMLC presentation\nABOUT THE AUTHOR\n<h4><a id=\"Staff_writer_76\"></a>Staff writer</h4>\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家