Announcing NVIDIA GPU support for Bottlerocket on Amazon ECS

数据分析
容器
海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"Last year, we announced the general availability of the [Amazon Elastic Container Service (Amazon ECS)-optimized Bottlerocket AMI](https://aws.amazon.com/about-aws/whats-new/2021/06/the-bottlerocket-ami-for-amazon-ecs-is-now-generally-available/). Bottlerocket is an open source project that focuses on security and maintainability, providing a reliable and consistent Linux distribution for hosting container-based workloads. Now, we are happy to announce that you can now run ECS NVIDIA GPU-accelerated workloads on ECS using Bottlerocket.\n\nIn this post, we will walk through how to create an Amazon ECS task to run an NVIDIA GPU workload on Bottlerocket.\n\n### **Why Bottlerocket?**\nCustomers have continued to adopt containers to run their workloads, and AWS saw a need for a Linux distribution designed and optimized to run these containerized applications. Bottlerocket OS was built to provide a secure foundation for hosts running containers, and minimizing operational overhead to manage them at scale. Bottlerocket is designed for reliable updates that can be applied through automation.\n\nYou can learn more about getting started with Bottlerocket and Amazon ECS in the [Getting started with Bottlerocket and Amazon ECS](https://aws.amazon.com/blogs/containers/getting-started-with-bottlerocket-and-amazon-ecs/) blog post.\n\n### **Setting up an ECS cluster with Bottlerocket and NVIDIA GPUs**\nLet’s have a look at how this is done in practice. We will be working in the us-west-2 (Oregon) Region.\n\n#### **Prerequisites**\n- The AWS CLI with appropriate credentials\n- A default VPC in a region of your choice (you can also use an existing VPC in your account)\n\n\nFirst, let’s create the ECS cluster named ```ecs-bottlerocket```.\n\n```aws ecs --region us-west-2 create-cluster --cluster-name ecs-bottlerocket```\n\nThe instance we’re launching will need an [AWS Identity and Access Management (IAM)](https://aws.amazon.com/iam/) role to communicate both with the ECS APIs and the Systems Manager Session Manager APIs as well. I have created an IAM role named ```ecsInstanceRole``` that has both the [AmazonSSMManagedInstanceCore](https://console.aws.amazon.com/iam/home?region=us-west-2#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAmazonSSMManagedInstanceCore) and the [AmazonEC2ContainerServiceforEC2Role](https://console.aws.amazon.com/iam/home?region=us-west-2#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2Fservice-role%2FAmazonEC2ContainerServiceforEC2Role) managed policies attached.\n\nThe list of Bottlerocket Amazon Machine Images (AMIs) supported for use with NVIDIA GPUs is publicly available from [AWS Systems Manager Parameter Store](https://aws.amazon.com/systems-manager), so let’s get the AMI ID for the latest Bottlerocket release. (AMIs are available for both ```x86_64``` and ```aarch64``` architectures). In this blog post we are going to be using the ```x86_64``` AMI.\n\n```\nlatest_bottlerocket_ami=$(aws ssm get-parameter --region us-west-2 \\\n --name \"/aws/service/bottlerocket/aws-ecs-1-nvidia/x86_64/latest/image_id\" \\\n --query Parameter.Value --output text)\n```\n\nNext, we get the list of subnets that are configured to allocate a public IP address.\n\n```\naws ec2 describe-subnets \\\n --region us-west-2 \\\n --filter=Name=vpc-id,Values=$vpc_id \\\n --query 'Subnets[?MapPublicIpOnLaunch == `true`].SubnetId'\n \n[\n \"subnet-bc8993e6\",\n \"subnet-b55f6bfe\",\n \"subnet-e1e27fca\",\n \"subnet-21cbc058\"\n]\n```\n\nTo associate our EC2 instance to the ECS cluster, we need to provide some information to the instance when we create it: a small config file (userdata.toml) that has the details of the ECS cluster, saved in a file in the current directory.\n\nA full set of supported settings is [here](https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#settings).\n\n```\ncat > ./userdata.toml << 'EOF'\n[settings.ecs]\ncluster = \"ecs-bottlerocket\"\nEOF\n```\n\nLet’s deploy one Bottlerocket instance in one of the subnets above. We are choosing a public subnet for this blog post. It will be easier to debug and connect to the instances if needed. You can choose private or public subnets based on your use case.\n\nWe are using the [p3.2](https://aws.amazon.com/ec2/instance-types/g5/)[xlarge](https://aws.amazon.com/ec2/instance-types/p3) instance type, which has one NVIDIA Tesla V100 Tesla Core GPU.\n\n```\naws ec2 run-instances \\\n --subnet-id subnet-bc8993e6 \\\n --image-id $latest_bottlerocket_ami \\\n --instance-type p3.2xlarge \\\n --region us-west-2 \\\n --tag-specifications 'ResourceType=instance,Tags=[{Key=bottlerocket,Value=quickstart}]' \\\n --user-data file://userdata.toml \\\n --iam-instance-profile Name=ecsInstanceRole\n```\n\nNext, let’s create the task definition for the sample application.\n\n```\ncat > ./sample-gpu.json << 'EOF'\n{\n \"containerDefinitions\": [\n {\n \"memory\": 80,\n \"essential\": true,\n \"name\": \"gpu\",\n \"image\": \"nvidia/cuda:11.0-base\",\n \"resourceRequirements\": [\n {\n \"type\":\"GPU\",\n \"value\": \"1\"\n }\n ],\n \"command\": [\n \"sh\",\n \"-c\",\n \"nvidia-smi\"\n ],\n \"cpu\": 100,\n \"logConfiguration\": {\n \"logDriver\": \"awslogs\",\n \"options\": {\n \"awslogs-group\": \"/ecs/bottlerocket\",\n \"awslogs-region\": \"us-west-2\",\n \"awslogs-stream-prefix\": \"demo-gpu\"\n }\n }\n }\n ],\n \"family\": \"example-ecs-gpu\"\n}\nEOF\n```\n\nIn the task definition, assign one NVIDIA GPU to our task through the [resourceRequirements](https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ResourceRequirement.html) parameter. We are also defining the awslogs-group configuration for our task to send the log output from our container into [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/).\n\nThe log group configuration is as follows:\n\n- region: us-west-2\n- log group name: /ecs/bottlerocket\n- log stream prefix: demo-gpu\n\n\nCreate the CloudWatch log group specified above in the task definition.\n\n```\naws logs create-log-group –log-group-name ‘/ecs/bottlerocket’ –region us-west-2\n\nRegister the task in ECS.\n\naws ecs register-task-definition \\\n --region us-west-2 \\\n --cli-input-json file://sample-gpu.json\n```\n\nRun the task.\n\n```\naws ecs run-task --cluster ecs-bottlerocket \\ \n --task-definition bottlerocket-gpu:1\n```\n\nThe task will run and execute a command () inside the container to provide information on the GPU configuration available, and then it will exit.\n\nWhen you go into the [ECS console in your account](https://us-west-2.console.aws.amazon.com/ecs/v2/clusters/ecs-bottlerocket/tasks?region=us-west-2), you will see a stopped task. Select **Clusters** on the left menu, select the ```ecs-bottlerocket``` cluster, and then select the **Tasks** tab.\n\n![image.png](https://dev-media.amazoncloud.cn/f9ca2a29dee447cf8cff62faddf2762f_image.png)\n\nClick on the task ID and then select the Logs tab, which will show you the log output from the task that just ran:\n\n![image.png](https://dev-media.amazoncloud.cn/efaf922f64bf4438a33ffff69978344a_image.png)\n\nYou can also view the log output from the container from the command line. By passing in both the log group name, the log stream name and a timeframe. In my case this would be:\n\n```\naws logs tail '/ecs/bottlerocket' / \n --log-stream-names 'demo-gpu/gpu/7af782059c644872977da89a06023483' /\n --since 1h --format short\n```\n\n![image.png](https://dev-media.amazoncloud.cn/017c2b0cff7144a9a805ee118db3cbd7_image.png)\n\n### **Cleanup**\nTo remove the resources that you created during this post, run the following commands.\n\n```\naws ecs deregister-task-definition \\\n --region us-west-2 \\\n --task-definition bottlerocket-gpu:1\n \ndelete_instances=$(aws ec2 describe-instances --region us-west-2 \\\n --filters \"Name=tag-key,Values=bottlerocket\" \"Name=tag-value,Values=quickstart\" \\\n --query 'Reservations[].Instances[].InstanceId') \n \nfor instance in $delete_instances\n do aws ec2 terminate-instances --instance-ids $instance --region us-west-2\ndone \n\naws ecs delete-cluster \\\n --region us-west-2 \\\n --cluster ecs-bottlerocket\n\naws logs delete-log-group --log-group-name '/ecs/bottlerocket'\n```\n\n### **Conclusion**\nIn this post, we walked through how to create an ECS task definition with the appropriate configuration that will let you run a GPU enabled workload inside a container on Bottlerocket, quickly and securely. We also saw how the container logs are available in CloudWatch and how to access them from the command line. If you are looking for additional examples of GPU-accelerated workloads to run with Bottlerocket on ECS, you can check out the NVIDIA GPU-optimized containers from the [NVIDIA NGC catalog on AWS Marketplace](https://aws.amazon.com/marketplace/featured-seller/nvidia-ngc).\n\nBottlerocket is open source (MIT or Apache 2.0 licensed), meaning you have a number of well-documented freedoms to use, modify, and extend. Bottlerocket is also developed in the open on GitHub ([https://github.com/bottlerocket-os/](https://github.com/bottlerocket-os/)) and welcomes contribution, issues, and feedback on our discussion forum ([https://github.com/bottlerocket-os/bottlerocket/discussions](https://github.com/bottlerocket-os/bottlerocket/discussions)).\n\n![image.png](https://dev-media.amazoncloud.cn/ae9d48173d9f4cff9a16f53db8998635_image.png)\n\n**Maish Saidel-Keesing**\n\nMaish Saidel-Keesing is a Senior Enterprise Developer Advocate in the Container Services team. Maish lives in Israel with his wife and 3 daughters and focuses on improving customer experience with everything related to containers in the cloud. You can always reach out to him on Twitter (@maishsk).","render":"<p>Last year, we announced the general availability of the <a href=\"https://aws.amazon.com/about-aws/whats-new/2021/06/the-bottlerocket-ami-for-amazon-ecs-is-now-generally-available/\" target=\"_blank\">Amazon Elastic Container Service (Amazon ECS)-optimized Bottlerocket AMI</a>. Bottlerocket is an open source project that focuses on security and maintainability, providing a reliable and consistent Linux distribution for hosting container-based workloads. Now, we are happy to announce that you can now run ECS NVIDIA GPU-accelerated workloads on ECS using Bottlerocket.</p>\n<p>In this post, we will walk through how to create an Amazon ECS task to run an NVIDIA GPU workload on Bottlerocket.</p>\n<h3><a id=\"Why_Bottlerocket_4\"></a><strong>Why Bottlerocket?</strong></h3>\n<p>Customers have continued to adopt containers to run their workloads, and AWS saw a need for a Linux distribution designed and optimized to run these containerized applications. Bottlerocket OS was built to provide a secure foundation for hosts running containers, and minimizing operational overhead to manage them at scale. Bottlerocket is designed for reliable updates that can be applied through automation.</p>\n<p>You can learn more about getting started with Bottlerocket and Amazon ECS in the <a href=\"https://aws.amazon.com/blogs/containers/getting-started-with-bottlerocket-and-amazon-ecs/\" target=\"_blank\">Getting started with Bottlerocket and Amazon ECS</a> blog post.</p>\n<h3><a id=\"Setting_up_an_ECS_cluster_with_Bottlerocket_and_NVIDIA_GPUs_9\"></a><strong>Setting up an ECS cluster with Bottlerocket and NVIDIA GPUs</strong></h3>\n<p>Let’s have a look at how this is done in practice. We will be working in the us-west-2 (Oregon) Region.</p>\n<h4><a id=\"Prerequisites_12\"></a><strong>Prerequisites</strong></h4>\n<ul>\n<li>The AWS CLI with appropriate credentials</li>\n<li>A default VPC in a region of your choice (you can also use an existing VPC in your account)</li>\n</ul>\n<p>First, let’s create the ECS cluster named <code>ecs-bottlerocket</code>.</p>\n<p><code>aws ecs --region us-west-2 create-cluster --cluster-name ecs-bottlerocket</code></p>\n<p>The instance we’re launching will need an <a href=\"https://aws.amazon.com/iam/\" target=\"_blank\">AWS Identity and Access Management (IAM)</a> role to communicate both with the ECS APIs and the Systems Manager Session Manager APIs as well. I have created an IAM role named <code>ecsInstanceRole</code> that has both the <a href=\"https://console.aws.amazon.com/iam/home?region=us-west-2#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAmazonSSMManagedInstanceCore\" target=\"_blank\">AmazonSSMManagedInstanceCore</a> and the <a href=\"https://console.aws.amazon.com/iam/home?region=us-west-2#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2Fservice-role%2FAmazonEC2ContainerServiceforEC2Role\" target=\"_blank\">AmazonEC2ContainerServiceforEC2Role</a> managed policies attached.</p>\n<p>The list of Bottlerocket Amazon Machine Images (AMIs) supported for use with NVIDIA GPUs is publicly available from <a href=\"https://aws.amazon.com/systems-manager\" target=\"_blank\">AWS Systems Manager Parameter Store</a>, so let’s get the AMI ID for the latest Bottlerocket release. (AMIs are available for both <code>x86_64</code> and <code>aarch64</code> architectures). In this blog post we are going to be using the <code>x86_64</code> AMI.</p>\n<pre><code class=\"lang-\">latest_bottlerocket_ami=$(aws ssm get-parameter --region us-west-2 \\\n --name &quot;/aws/service/bottlerocket/aws-ecs-1-nvidia/x86_64/latest/image_id&quot; \\\n --query Parameter.Value --output text)\n</code></pre>\n<p>Next, we get the list of subnets that are configured to allocate a public IP address.</p>\n<pre><code class=\"lang-\">aws ec2 describe-subnets \\\n --region us-west-2 \\\n --filter=Name=vpc-id,Values=$vpc_id \\\n --query 'Subnets[?MapPublicIpOnLaunch == `true`].SubnetId'\n \n[\n &quot;subnet-bc8993e6&quot;,\n &quot;subnet-b55f6bfe&quot;,\n &quot;subnet-e1e27fca&quot;,\n &quot;subnet-21cbc058&quot;\n]\n</code></pre>\n<p>To associate our EC2 instance to the ECS cluster, we need to provide some information to the instance when we create it: a small config file (userdata.toml) that has the details of the ECS cluster, saved in a file in the current directory.</p>\n<p>A full set of supported settings is <a href=\"https://github.com/bottlerocket-os/bottlerocket/blob/develop/README.md#settings\" target=\"_blank\">here</a>.</p>\n<pre><code class=\"lang-\">cat &gt; ./userdata.toml &lt;&lt; 'EOF'\n[settings.ecs]\ncluster = &quot;ecs-bottlerocket&quot;\nEOF\n</code></pre>\n<p>Let’s deploy one Bottlerocket instance in one of the subnets above. We are choosing a public subnet for this blog post. It will be easier to debug and connect to the instances if needed. You can choose private or public subnets based on your use case.</p>\n<p>We are using the <a href=\"https://aws.amazon.com/ec2/instance-types/g5/\" target=\"_blank\">p3.2</a><a href=\"https://aws.amazon.com/ec2/instance-types/p3\" target=\"_blank\">xlarge</a> instance type, which has one NVIDIA Tesla V100 Tesla Core GPU.</p>\n<pre><code class=\"lang-\">aws ec2 run-instances \\\n --subnet-id subnet-bc8993e6 \\\n --image-id $latest_bottlerocket_ami \\\n --instance-type p3.2xlarge \\\n --region us-west-2 \\\n --tag-specifications 'ResourceType=instance,Tags=[{Key=bottlerocket,Value=quickstart}]' \\\n --user-data file://userdata.toml \\\n --iam-instance-profile Name=ecsInstanceRole\n</code></pre>\n<p>Next, let’s create the task definition for the sample application.</p>\n<pre><code class=\"lang-\">cat &gt; ./sample-gpu.json &lt;&lt; 'EOF'\n{\n &quot;containerDefinitions&quot;: [\n {\n &quot;memory&quot;: 80,\n &quot;essential&quot;: true,\n &quot;name&quot;: &quot;gpu&quot;,\n &quot;image&quot;: &quot;nvidia/cuda:11.0-base&quot;,\n &quot;resourceRequirements&quot;: [\n {\n &quot;type&quot;:&quot;GPU&quot;,\n &quot;value&quot;: &quot;1&quot;\n }\n ],\n &quot;command&quot;: [\n &quot;sh&quot;,\n &quot;-c&quot;,\n &quot;nvidia-smi&quot;\n ],\n &quot;cpu&quot;: 100,\n &quot;logConfiguration&quot;: {\n &quot;logDriver&quot;: &quot;awslogs&quot;,\n &quot;options&quot;: {\n &quot;awslogs-group&quot;: &quot;/ecs/bottlerocket&quot;,\n &quot;awslogs-region&quot;: &quot;us-west-2&quot;,\n &quot;awslogs-stream-prefix&quot;: &quot;demo-gpu&quot;\n }\n }\n }\n ],\n &quot;family&quot;: &quot;example-ecs-gpu&quot;\n}\nEOF\n</code></pre>\n<p>In the task definition, assign one NVIDIA GPU to our task through the <a href=\"https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ResourceRequirement.html\" target=\"_blank\">resourceRequirements</a> parameter. We are also defining the awslogs-group configuration for our task to send the log output from our container into <a href=\"https://aws.amazon.com/cloudwatch/\" target=\"_blank\">Amazon CloudWatch</a>.</p>\n<p>The log group configuration is as follows:</p>\n<ul>\n<li>region: us-west-2</li>\n<li>log group name: /ecs/bottlerocket</li>\n<li>log stream prefix: demo-gpu</li>\n</ul>\n<p>Create the CloudWatch log group specified above in the task definition.</p>\n<pre><code class=\"lang-\">aws logs create-log-group –log-group-name ‘/ecs/bottlerocket’ –region us-west-2\n\nRegister the task in ECS.\n\naws ecs register-task-definition \\\n --region us-west-2 \\\n --cli-input-json file://sample-gpu.json\n</code></pre>\n<p>Run the task.</p>\n<pre><code class=\"lang-\">aws ecs run-task --cluster ecs-bottlerocket \\ \n --task-definition bottlerocket-gpu:1\n</code></pre>\n<p>The task will run and execute a command () inside the container to provide information on the GPU configuration available, and then it will exit.</p>\n<p>When you go into the <a href=\"https://us-west-2.console.aws.amazon.com/ecs/v2/clusters/ecs-bottlerocket/tasks?region=us-west-2\" target=\"_blank\">ECS console in your account</a>, you will see a stopped task. Select <strong>Clusters</strong> on the left menu, select the <code>ecs-bottlerocket</code> cluster, and then select the <strong>Tasks</strong> tab.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/f9ca2a29dee447cf8cff62faddf2762f_image.png\" alt=\"image.png\" /></p>\n<p>Click on the task ID and then select the Logs tab, which will show you the log output from the task that just ran:</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/efaf922f64bf4438a33ffff69978344a_image.png\" alt=\"image.png\" /></p>\n<p>You can also view the log output from the container from the command line. By passing in both the log group name, the log stream name and a timeframe. In my case this would be:</p>\n<pre><code class=\"lang-\">aws logs tail '/ecs/bottlerocket' / \n --log-stream-names 'demo-gpu/gpu/7af782059c644872977da89a06023483' /\n --since 1h --format short\n</code></pre>\n<p><img src=\"https://dev-media.amazoncloud.cn/017c2b0cff7144a9a805ee118db3cbd7_image.png\" alt=\"image.png\" /></p>\n<h3><a id=\"Cleanup_159\"></a><strong>Cleanup</strong></h3>\n<p>To remove the resources that you created during this post, run the following commands.</p>\n<pre><code class=\"lang-\">aws ecs deregister-task-definition \\\n --region us-west-2 \\\n --task-definition bottlerocket-gpu:1\n \ndelete_instances=$(aws ec2 describe-instances --region us-west-2 \\\n --filters &quot;Name=tag-key,Values=bottlerocket&quot; &quot;Name=tag-value,Values=quickstart&quot; \\\n --query 'Reservations[].Instances[].InstanceId') \n \nfor instance in $delete_instances\n do aws ec2 terminate-instances --instance-ids $instance --region us-west-2\ndone \n\naws ecs delete-cluster \\\n --region us-west-2 \\\n --cluster ecs-bottlerocket\n\naws logs delete-log-group --log-group-name '/ecs/bottlerocket'\n</code></pre>\n<h3><a id=\"Conclusion_182\"></a><strong>Conclusion</strong></h3>\n<p>In this post, we walked through how to create an ECS task definition with the appropriate configuration that will let you run a GPU enabled workload inside a container on Bottlerocket, quickly and securely. We also saw how the container logs are available in CloudWatch and how to access them from the command line. If you are looking for additional examples of GPU-accelerated workloads to run with Bottlerocket on ECS, you can check out the NVIDIA GPU-optimized containers from the <a href=\"https://aws.amazon.com/marketplace/featured-seller/nvidia-ngc\" target=\"_blank\">NVIDIA NGC catalog on AWS Marketplace</a>.</p>\n<p>Bottlerocket is open source (MIT or Apache 2.0 licensed), meaning you have a number of well-documented freedoms to use, modify, and extend. Bottlerocket is also developed in the open on GitHub (<a href=\"https://github.com/bottlerocket-os/\" target=\"_blank\">https://github.com/bottlerocket-os/</a>) and welcomes contribution, issues, and feedback on our discussion forum (<a href=\"https://github.com/bottlerocket-os/bottlerocket/discussions\" target=\"_blank\">https://github.com/bottlerocket-os/bottlerocket/discussions</a>).</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/ae9d48173d9f4cff9a16f53db8998635_image.png\" alt=\"image.png\" /></p>\n<p><strong>Maish Saidel-Keesing</strong></p>\n<p>Maish Saidel-Keesing is a Senior Enterprise Developer Advocate in the Container Services team. Maish lives in Israel with his wife and 3 daughters and focuses on improving customer experience with everything related to containers in the cloud. You can always reach out to him on Twitter (@maishsk).</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭