NeurIPS reinforcement-learning-challenge winners announced

机器学习

强化学习

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Competitions are a key part of the annual ++[NeurIPS conference program](https://nips.cc/Conferences/2020/Sponsors)++. This year, ++[16 competitions](https://neurips.cc/Conferences/2020/CompetitionTrack)++ were accepted, and a quarter of them focused on facilitating scientific progress in deep ++[reinforcement learning](https://www.amazon.science/blog/neurips-shipra-agrawal-on-the-appeal-of-reinforcement-learning)++ (RL), in which agents learn to maximize some reward through trial-and-error exploration of their environments. \n\nIn recent years, RL has led to breakthroughs in gaming, autonomous driving, electric-grid management, and other areas. The Amazon SageMaker RL team was excited to collaborate with ++[AIcrowd](https://www.aicrowd.com/)++ in supporting training and evaluations for the ++[Procgen Challenge](https://slideslive.com/38942723/procgen-benchmark)++, which was sponsored by Amazon Web Services. \n\n![下载.gif](https://dev-media.amazoncloud.cn/3941af36d1384a5e9a5252d74333cf6b_%E4%B8%8B%E8%BD%BD.gif)\n\nSamples of the 16 types of procedurally generated gym environments available with the Procgen Benchmark.\n\nTo win this challenge, competitors had to develop new RL models that maximized sample efficiency and generalization. The Amazon SageMaker RL team open-sourced a starter notebook using ++[AnyScale’s](https://www.anyscale.com/events/2020/06/24/rllib-deep-dive)++ Ray RLlib, a library for implementing RL applications with the Ray distributed-learning framework. This helped participants iterate faster; in fact, with Amazon SageMaker ++[notebook instances](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html)++, the competitors got results in less than an hour for a few US dollars. \n\nThe challenge featured two tracks — generalization and sample efficiency —and comprised three rounds of competition that attracted more than 500 participants on 82 teams. Participants could compete in one or both of the tracks.\n\nRound one thinned the field to 50 teams, and round two identified 10 finalists. In the final two rounds, AIcrowd ran 33,000 models that generated more than 230,000 virtual-CPU and 28.5k GPU hours. During the entire competition, 172,000 models were evaluated using Amazon SageMaker. \n\n\n#### **The winning teams**\n\n\nLast Friday, at the virtual ++[NeurIPS Deep Reinforcement Learning Workshop](https://sites.google.com/view/deep-rl-workshop-neurips2020/home)++, the winners ++[were announced](https://slideslive.com/38942749/competition-overview)++. \n\nCongratulations to the winner of the generalization track, the two-person team of ++[Dipam Chakraborty](https://www.linkedin.com/in/dipamchakraborty-dipamc77/)++ and ++[Nhat Quang Tran](https://www.linkedin.com/in/nhat-quang-tran/)++, and the winner of the sample-efficiency track, the two-person team of ++[Adrien Gaidon](https://www.linkedin.com/in/adrien-gaidon-63ab2358/)++ and ++[Blake Wulfe](https://www.linkedin.com/in/blake-wulfe-78b266179/)++. Both teams’ solutions were based on modifications to the ++[phasic policy gradient (PPG)](https://arxiv.org/abs/2009.04416)++ algorithm, a new reinforcement learning algorithm that preserves feature sharing between the policy and value function, while otherwise decoupling their training. Both teams used hyperparameter tuning to optimize their approaches. \n\nDipam and Quang applied several modifications to the original PPG algorithm, which allowed them to achieve the best performance on generalizing RL agents learning from previously unseen environments. More details about their approach can be found in their ++[presentation video](https://www.youtube.com/watch?v=BYF6e7t5p9E&feature=youtu.be)++ from the competition, while AIcrowd hosts their ++[evaluation videos and code](https://www.aicrowd.com/challenges/neurips-2020-procgen-competition/submissions/93732)++.\n\nAdrien and Blake’s modifications of PPG included data augmentation during the auxiliary phase but not during the policy phase. They also experimented with reward normalization and reward shaping. Their approach achieved the best performance on sample efficiency, or using the smallest number of samples to reach a specified reward value. This made their model the fastest to train. Their ++[presentation video](https://drive.google.com/file/d/1lLlq54YK0MXOK8MPjL_CfHnZC0IPyc60/view)++ is also online, as are their ++[evaluation videos and code](https://www.aicrowd.com/challenges/neurips-2020-procgen-competition/submissions/94610)++.\n\nAs sponsor, Amazon Web Services awarded the top teams $9,000 in cash and $9,000 in Amazon Web Services credits.\n\n\n#### **Background on the challenge**\n\n\nThe challenge, designed by AICrowd in collaboration with ++[OpenAI](https://openai.com/)++, was based on the ++[OpenAI Procgen Benchmark](https://openai.com/blog/procgen-benchmark/)++. One of the designers’ goals was a centralized and accessible leaderboard to measure sample efficiency and generalization in RL. More information about the ++[design of the challenge](https://slideslive.com/38942724/procgen-benchmark-measure-sample-efficiency-and-generalization-in-rl)++ is available online. \n\nThe ++[Procgen Benchmark](https://github.com/openai/procgen)++ is a suite of 16 procedurally generated ++[gym](https://github.com/openai/gym)++ environments that provide direct measures of how quickly an RL agent learns generalizable skills. Agents were evaluated in procedurally generated instances of each of these environments, which were publicly accessible, and in four secret test environments created for the competition. By aggregating performance across so many diverse environments, we obtained high-quality metrics with which to judge the underlying algorithms.\n\nSince each Procgen environment was generated procedurally, it required agents to generalize to never-before-seen situations. As a result, these environments provided a robust test of an agent's ability to learn in many diverse settings. Moreover, Procgen environments are designed to be lightweight and simple to use. Participants with limited computational resources could easily reproduce baseline results and run new experiments. More details about the design principles and details of individual environments can be found in the paper “++[Leveraging procedural generation to benchmark reinforcement learning](https://arxiv.org/pdf/1912.01588.pdf)++”. \n\nThe Amazon SageMaker RL team is grateful for the opportunity to sponsor this challenge. We want to once again congratulate all participants, particularly our winners, and would like to especially thank AIcrowd for its role in supporting the competition.\n\nABOUT THE AUTHOR\n\n\n#### **[Sahika Genc](https://www.amazon.science/author/sahika-genc)**\n\n\nSahika Genc is a principal applied scientist within Amazon AI. Her team works on reinforcement learning algorithms for Amazon SageMaker.","render":"Competitions are a key part of the annual <ins><a href=\"https://nips.cc/Conferences/2020/Sponsors\" target=\"_blank\">NeurIPS conference program</a></ins>. This year, <ins><a href=\"https://neurips.cc/Conferences/2020/CompetitionTrack\" target=\"_blank\">16 competitions</a></ins> were accepted, and a quarter of them focused on facilitating scientific progress in deep <ins><a href=\"https://www.amazon.science/blog/neurips-shipra-agrawal-on-the-appeal-of-reinforcement-learning\" target=\"_blank\">reinforcement learning</a></ins> (RL), in which agents learn to maximize some reward through trial-and-error exploration of their environments.\nIn recent years, RL has led to breakthroughs in gaming, autonomous driving, electric-grid management, and other areas. The Amazon SageMaker RL team was excited to collaborate with <ins><a href=\"https://www.aicrowd.com/\" target=\"_blank\">AIcrowd</a></ins> in supporting training and evaluations for the <ins><a href=\"https://slideslive.com/38942723/procgen-benchmark\" target=\"_blank\">Procgen Challenge</a></ins>, which was sponsored by Amazon Web Services.\n<img src=\"https://dev-media.amazoncloud.cn/3941af36d1384a5e9a5252d74333cf6b_%E4%B8%8B%E8%BD%BD.gif\" alt=\"下载.gif\" />\nSamples of the 16 types of procedurally generated gym environments available with the Procgen Benchmark.\nTo win this challenge, competitors had to develop new RL models that maximized sample efficiency and generalization. The Amazon SageMaker RL team open-sourced a starter notebook using <ins><a href=\"https://www.anyscale.com/events/2020/06/24/rllib-deep-dive\" target=\"_blank\">AnyScale’s</a></ins> Ray RLlib, a library for implementing RL applications with the Ray distributed-learning framework. This helped participants iterate faster; in fact, with Amazon SageMaker <ins><a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html\" target=\"_blank\">notebook instances</a></ins>, the competitors got results in less than an hour for a few US dollars.\nThe challenge featured two tracks — generalization and sample efficiency —and comprised three rounds of competition that attracted more than 500 participants on 82 teams. Participants could compete in one or both of the tracks.\nRound one thinned the field to 50 teams, and round two identified 10 finalists. In the final two rounds, AIcrowd ran 33,000 models that generated more than 230,000 virtual-CPU and 28.5k GPU hours. During the entire competition, 172,000 models were evaluated using Amazon SageMaker.\n<h4><a id=\"The_winning_teams_15\"></a>The winning teams</h4>\nLast Friday, at the virtual <ins><a href=\"https://sites.google.com/view/deep-rl-workshop-neurips2020/home\" target=\"_blank\">NeurIPS Deep Reinforcement Learning Workshop</a></ins>, the winners <ins><a href=\"https://slideslive.com/38942749/competition-overview\" target=\"_blank\">were announced</a></ins>.\nCongratulations to the winner of the generalization track, the two-person team of <ins><a href=\"https://www.linkedin.com/in/dipamchakraborty-dipamc77/\" target=\"_blank\">Dipam Chakraborty</a></ins> and <ins><a href=\"https://www.linkedin.com/in/nhat-quang-tran/\" target=\"_blank\">Nhat Quang Tran</a></ins>, and the winner of the sample-efficiency track, the two-person team of <ins><a href=\"https://www.linkedin.com/in/adrien-gaidon-63ab2358/\" target=\"_blank\">Adrien Gaidon</a></ins> and <ins><a href=\"https://www.linkedin.com/in/blake-wulfe-78b266179/\" target=\"_blank\">Blake Wulfe</a></ins>. Both teams’ solutions were based on modifications to the <ins><a href=\"https://arxiv.org/abs/2009.04416\" target=\"_blank\">phasic policy gradient (PPG)</a></ins> algorithm, a new reinforcement learning algorithm that preserves feature sharing between the policy and value function, while otherwise decoupling their training. Both teams used hyperparameter tuning to optimize their approaches.\nDipam and Quang applied several modifications to the original PPG algorithm, which allowed them to achieve the best performance on generalizing RL agents learning from previously unseen environments. More details about their approach can be found in their <ins><a href=\"https://www.youtube.com/watch?v=BYF6e7t5p9E&feature=youtu.be\" target=\"_blank\">presentation video</a></ins> from the competition, while AIcrowd hosts their <ins><a href=\"https://www.aicrowd.com/challenges/neurips-2020-procgen-competition/submissions/93732\" target=\"_blank\">evaluation videos and code</a></ins>.\nAdrien and Blake’s modifications of PPG included data augmentation during the auxiliary phase but not during the policy phase. They also experimented with reward normalization and reward shaping. Their approach achieved the best performance on sample efficiency, or using the smallest number of samples to reach a specified reward value. This made their model the fastest to train. Their <ins><a href=\"https://drive.google.com/file/d/1lLlq54YK0MXOK8MPjL_CfHnZC0IPyc60/view\" target=\"_blank\">presentation video</a></ins> is also online, as are their <ins><a href=\"https://www.aicrowd.com/challenges/neurips-2020-procgen-competition/submissions/94610\" target=\"_blank\">evaluation videos and code</a></ins>.\nAs sponsor, Amazon Web Services awarded the top teams $9,000 in cash and $9,000 in Amazon Web Services credits.\n<h4><a id=\"Background_on_the_challenge_29\"></a>Background on the challenge</h4>\nThe challenge, designed by AICrowd in collaboration with <ins><a href=\"https://openai.com/\" target=\"_blank\">OpenAI</a></ins>, was based on the <ins><a href=\"https://openai.com/blog/procgen-benchmark/\" target=\"_blank\">OpenAI Procgen Benchmark</a></ins>. One of the designers’ goals was a centralized and accessible leaderboard to measure sample efficiency and generalization in RL. More information about the <ins><a href=\"https://slideslive.com/38942724/procgen-benchmark-measure-sample-efficiency-and-generalization-in-rl\" target=\"_blank\">design of the challenge</a></ins> is available online.\nThe <ins><a href=\"https://github.com/openai/procgen\" target=\"_blank\">Procgen Benchmark</a></ins> is a suite of 16 procedurally generated <ins><a href=\"https://github.com/openai/gym\" target=\"_blank\">gym</a></ins> environments that provide direct measures of how quickly an RL agent learns generalizable skills. Agents were evaluated in procedurally generated instances of each of these environments, which were publicly accessible, and in four secret test environments created for the competition. By aggregating performance across so many diverse environments, we obtained high-quality metrics with which to judge the underlying algorithms.\nSince each Procgen environment was generated procedurally, it required agents to generalize to never-before-seen situations. As a result, these environments provided a robust test of an agent’s ability to learn in many diverse settings. Moreover, Procgen environments are designed to be lightweight and simple to use. Participants with limited computational resources could easily reproduce baseline results and run new experiments. More details about the design principles and details of individual environments can be found in the paper “<ins><a href=\"https://arxiv.org/pdf/1912.01588.pdf\" target=\"_blank\">Leveraging procedural generation to benchmark reinforcement learning</a></ins>”.\nThe Amazon SageMaker RL team is grateful for the opportunity to sponsor this challenge. We want to once again congratulate all participants, particularly our winners, and would like to especially thank AIcrowd for its role in supporting the competition.\nABOUT THE AUTHOR\n<h4><a id=\"Sahika_Genchttpswwwamazonscienceauthorsahikagenc_43\"></a><a href=\"https://www.amazon.science/author/sahika-genc\" target=\"_blank\">Sahika Genc</a></h4>\nSahika Genc is a principal applied scientist within Amazon AI. Her team works on reinforcement learning algorithms for Amazon SageMaker.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家