Generating DevOps Guru Proactive Insights for Amazon ECS

海外精选
6
0
{"value":"Monitoring is fundamental to operating an application in production, since we can only operate what we can measure and alert on. As an application evolves, or the environment grows more complex, it becomes increasingly challenging to maintain monitoring thresholds for each component, and to validate that they’re still set to an effective value. We not only want monitoring alarms to trigger when needed, but also want to minimize false positives.\n\nAmazon DevOps Guru is an AWS service that helps you effectively monitor your application by ingesting vended metrics from [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/). It learns your application’s behavior over time and then detects anomalies. Based on these anomalies, it generates insights by first combining the detected anomalies with suspected related events from [AWS CloudTrail](https://aws.amazon.com/cloudtrail/), and then providing the information to you in a simple, ready-to-use dashboard when you start investigating potential issues. Amazon DevOpsGuru makes use of the CloudWatch [Containers Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html) to detect issues around resource exhaustion for [Amazon ECS](https://aws.amazon.com/eCs/) or [Amazon EKS](https://aws.amazon.com/eks/) applications. This helps in proactively detecting issues like memory leaks in your applications before they impact your users, and also provides guidance as to what the probable root-causes and resolutions might be.\n\nThis post will demonstrate how to simulate a memory leak in a container running in Amazon ECS, and have it generate a proactive insight in Amazon DevOps Guru.\n\n### **Solution Overview**\nThe following diagram shows the environment we’ll use for our scenario. The container “brickwall-maker” is preconfigured as to how quickly to allocate memory, and we have built this container image and published it to our public Amazon ECR repository. Optionally, you can build and host the docker image in your own private repository as described in step 2 & 3.\n\nAfter creating the container image, we’ll utilize an AWS CloudFormation template to create an ECS Cluster and an ECS Service called “Test” with a desired count of two. This will create two tasks using our “brickwall-maker” container image. The stack will also enable Container Insights for the ECS Cluster. Then, we will enable resource coverage for this CloudFormation stack in Amazon DevOpsGuru in order to start our resource analysis.\n\n![image.png](https://dev-media.amazoncloud.cn/c8ab95ae862a44898e9aa71803d5c073_image.png)\n\n#### **Source provided on [GitHub](https://github.com/aws-samples/amazon-devopsguru-brickwall-maker):**\n- DevOpsGuru.yaml\n- EnableDevOpsGuruForCfnStack.yaml\n- Docker container source\n### **Steps:**\n**1. Create your IDE environment**\n\nIn the [AWS Cloud9 console](https://console.aws.amazon.com/cloud9/home), click **Create environment**, give your environment a Name, and click **Next step**. On the **Environment settings** page, change the instance type to **t3.small**, and click **Next step**. On the **Review** page, make sure that the Name and Instance type are set as intended, and click **Create environment**. The environment creation will take a few minutes. After that, the AWS Cloud9 IDE will open, and you can continue working in the terminal tab displayed in the bottom pane of the IDE.\n\nInstall the following prerequisite packages, and ensure that you have docker installed:\n\nBash\n\n```\nsudo yum install -y docker\nsudo service docker start \ndocker --version\n```\nClone the git repository in order to download the required CloudFormation templates and code:\n```git clone https://github.com/aws-samples/amazon-devopsguru-brickwall-maker```\n\nChange to the directory that contains the cloned repository\n\n```cd amazon-devopsguru-brickwall-maker```\n\n**2. Optional : Create ECR private repository**\n\nIf you want to build your own container image and host it in your own private ECR repository, create a new repository with the following command and then follow the steps to prepare your own image:\n```\naws ecr create-repository —repository-name brickwall-maker\n```\n3. Optional: Prepare Docker Image\n\nAuthenticate to Amazon Elastic Container Registry (ECR) in the target region\n```\naws ecr get-login-password --region ap-northeast-1 | \\\n docker login --username AWS --password-stdin \\\n 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com\n```\nIn the above command, as well as in the following shown below, make sure that you replace ```123456789012``` with your own account ID.\n\nBuild brickwall-maker Docker container:\n```\ndocker build -t brickwall-maker .\n```\nTag the Docker container to prepare it to be pushed to ECR:\n```\ndocker tag brickwall-maker:latest 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/brickwall-maker:latest\n```\nPush the built Docker container to ECR\n```\ndocker push 123456789012.dkr.ecr.ap-northeast-1.amazonaws.c\n```\n**4. Launch the CloudFormation template to deploy your ECS infrastructure**\n\nTo deploy your ECS infrastructure, run the following command (replace your own private ECR URL or use our public URL) in the ParameterValue) to launch the CloudFormation template :\n```\naws cloudformation create-stack --stack-name myECS-Stack \\\n--template-body file://DevOpsGuru.yaml \\\n--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \\\n--parameters ParameterKey=ImageUrl,ParameterValue=public.ecr.aws/p8v8e7e5/myartifacts:brickwallv1\n```\n**5. Enable DevOps Guru to monitor the ECS Application**\n\nRun the following command to enable DevOps Guru for monitoring your ECS application:\n```\naws cloudformation create-stack \\\n--stack-name EnableDevOpsGuruForCfnStack \\\n--template-body file://EnableDevOpsGuruForCfnStack.yaml \\\n--parameters ParameterKey=CfnStackNames,ParameterValue=myECS-Stack\n```\n6. Wait for **base-lining** of resources\n\nThis step lets DevOps Guru complete the baselining of the resources and benchmark the normal behavior. For this particular scenario, we recommend waiting two days before any insights are triggered.\n\nUnlike other monitoring tools, the DevOps Guru dashboard would not present any counters or graphs. In the meantime, you can utilize CloudWatch Container Insights to monitor the cluster-level, task-level, and service-level metrics in ECS.\n\n**7. View Container Insights metrics**\n\n- Open the [CloudWatch console](https://console.aws.amazon.com/cloudwatch/).\n- In the navigation pane, choose **Container Insights**.\n- Use the drop-down boxes near the top to select **ECS Services** as the resource type to view, then select **DevOps Guru** as the resource to monitor.\n- The performance monitoring view will show you graphs for several metrics, including “Memory Utilization”, which you can watch increasing from here. In addition, it will show the list of tasks in the lower “Task performance” pane showing the “Avg CPU” and “Avg memory” metrics for the individual tasks.\n\n**8. Review DevOps Guru insights**\n\nWhen DevOps Guru detects an anomaly, it generates a proactive insight with the relevant information needed to investigate the anomaly, and it will list it in the [DevOps Guru Dashboard](https://console.aws.amazon.com/devops-guru/dashboard).\n\nYou can view the insights by clicking on the number of insights displayed in the dashboard. In our case, we expect insights to be shown in the “proactive insights” category on the dashboard.\n\nOnce you have opened the insight, you will see that the insight view is divided into the following sections:\n- Insight Overview with a basic description of the anomaly. In this case, stating that Memory Utilization is approaching limit with details of the stack that is being affected by the anomaly.\n- Anomalous metrics consisting of related graphs and a timeline of the predicted impact time in the future.\n- Relevant events with contextual information, such as changes or updates made to the CloudFormation stack’s resources in the region.\n- Recommendations to mitigate the issue. As seen in the following screenshot, it recommends troubleshooting High CPU or Memory Utilization in ECS along with a link to the necessary documentation.\n\nThe following screenshot illustrates an example insight detail page from DevOps Guru\n\n![image.png](https://dev-media.amazoncloud.cn/7d85a9778fa449d0ad811c06daec976b_image.png)\n\n![image.png](https://dev-media.amazoncloud.cn/e888b751d51c41b2ac62138fd7eb5979_image.png)\n\n### **Conclusion**\nThis post describes how DevOps Guru continuously monitors resources in a particular region in your AWS account, as well as proactively helps identify problems around resource exhaustion such as running out of memory, in advance. This helps IT operators take preventative actions even before a problem presents itself, thereby preventing downtime.\n\n#### **Cleaning up**\nAfter walking through this post, you should clean up and un-provision the resources in order to avoid incurring any further charges.\n\n1. To un-provision the CloudFormation stacks, on the AWS CloudFormation console, choose Stacks. Select the **stack** name, and choose **Delete**.\n2.[ Delete](https://docs.aws.amazon.com/cloud9/latest/user-guide/delete-environment.html) the AWS Cloud9 environment.\n3. [Delete](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-delete.html) the ECR repository.\n\n#### **About the authors**\n\n![image.png](https://dev-media.amazoncloud.cn/b3d7ee70cd134512b86c32daad2528bf_image.png)\n\n**Trishanka Saikia**\nTrishanka Saikia is a Technical Account Manager for AWS. She is also a DevOps enthusiast and works with AWS customers to design, deploy, and manage their AWS workloads/architectures.\n\n![image.png](https://dev-media.amazoncloud.cn/970452f3ed364284a8ea7b3038b15f88_image.png)\n\n**Gerhard Poul**\nGerhard Poul is a Senior Solutions Architect at Amazon Web Services based in Vienna, Austria. Gerhard works with customers in Austria to enable them with best practices in their cloud journey. He is passionate about infrastructure as code and how cloud technologies can improve IT operations.","render":"<p>Monitoring is fundamental to operating an application in production, since we can only operate what we can measure and alert on. As an application evolves, or the environment grows more complex, it becomes increasingly challenging to maintain monitoring thresholds for each component, and to validate that they’re still set to an effective value. We not only want monitoring alarms to trigger when needed, but also want to minimize false positives.</p>\n<p>Amazon DevOps Guru is an AWS service that helps you effectively monitor your application by ingesting vended metrics from <a href=\"https://aws.amazon.com/cloudwatch/\" target=\"_blank\">Amazon CloudWatch</a>. It learns your application’s behavior over time and then detects anomalies. Based on these anomalies, it generates insights by first combining the detected anomalies with suspected related events from <a href=\"https://aws.amazon.com/cloudtrail/\" target=\"_blank\">AWS CloudTrail</a>, and then providing the information to you in a simple, ready-to-use dashboard when you start investigating potential issues. Amazon DevOpsGuru makes use of the CloudWatch <a href=\"https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html\" target=\"_blank\">Containers Insights</a> to detect issues around resource exhaustion for <a href=\"https://aws.amazon.com/eCs/\" target=\"_blank\">Amazon ECS</a> or <a href=\"https://aws.amazon.com/eks/\" target=\"_blank\">Amazon EKS</a> applications. This helps in proactively detecting issues like memory leaks in your applications before they impact your users, and also provides guidance as to what the probable root-causes and resolutions might be.</p>\n<p>This post will demonstrate how to simulate a memory leak in a container running in Amazon ECS, and have it generate a proactive insight in Amazon DevOps Guru.</p>\n<h3><a id=\"Solution_Overview_6\"></a><strong>Solution Overview</strong></h3>\n<p>The following diagram shows the environment we’ll use for our scenario. The container “brickwall-maker” is preconfigured as to how quickly to allocate memory, and we have built this container image and published it to our public Amazon ECR repository. Optionally, you can build and host the docker image in your own private repository as described in step 2 &amp; 3.</p>\n<p>After creating the container image, we’ll utilize an AWS CloudFormation template to create an ECS Cluster and an ECS Service called “Test” with a desired count of two. This will create two tasks using our “brickwall-maker” container image. The stack will also enable Container Insights for the ECS Cluster. Then, we will enable resource coverage for this CloudFormation stack in Amazon DevOpsGuru in order to start our resource analysis.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/c8ab95ae862a44898e9aa71803d5c073_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Source_provided_on_GitHubhttpsgithubcomawssamplesamazondevopsgurubrickwallmaker_13\"></a><strong>Source provided on <a href=\"https://github.com/aws-samples/amazon-devopsguru-brickwall-maker\" target=\"_blank\">GitHub</a>:</strong></h4>\n<ul>\n<li>DevOpsGuru.yaml</li>\n<li>EnableDevOpsGuruForCfnStack.yaml</li>\n<li>Docker container source</li>\n</ul>\n<h3><a id=\"Steps_17\"></a><strong>Steps:</strong></h3>\n<p><strong>1. Create your IDE environment</strong></p>\n<p>In the <a href=\"https://console.aws.amazon.com/cloud9/home\" target=\"_blank\">AWS Cloud9 console</a>, click <strong>Create environment</strong>, give your environment a Name, and click <strong>Next step</strong>. On the <strong>Environment settings</strong> page, change the instance type to <strong>t3.small</strong>, and click <strong>Next step</strong>. On the <strong>Review</strong> page, make sure that the Name and Instance type are set as intended, and click <strong>Create environment</strong>. The environment creation will take a few minutes. After that, the AWS Cloud9 IDE will open, and you can continue working in the terminal tab displayed in the bottom pane of the IDE.</p>\n<p>Install the following prerequisite packages, and ensure that you have docker installed:</p>\n<p>Bash</p>\n<pre><code class=\"lang-\">sudo yum install -y docker\nsudo service docker start \ndocker --version\n</code></pre>\n<p>Clone the git repository in order to download the required CloudFormation templates and code:<br />\n<code>git clone https://github.com/aws-samples/amazon-devopsguru-brickwall-maker</code></p>\n<p>Change to the directory that contains the cloned repository</p>\n<p><code>cd amazon-devopsguru-brickwall-maker</code></p>\n<p><strong>2. Optional : Create ECR private repository</strong></p>\n<p>If you want to build your own container image and host it in your own private ECR repository, create a new repository with the following command and then follow the steps to prepare your own image:</p>\n<pre><code class=\"lang-\">aws ecr create-repository —repository-name brickwall-maker\n</code></pre>\n<ol start=\"3\">\n<li>Optional: Prepare Docker Image</li>\n</ol>\n<p>Authenticate to Amazon Elastic Container Registry (ECR) in the target region</p>\n<pre><code class=\"lang-\">aws ecr get-login-password --region ap-northeast-1 | \\\n docker login --username AWS --password-stdin \\\n 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com\n</code></pre>\n<p>In the above command, as well as in the following shown below, make sure that you replace <code>123456789012</code> with your own account ID.</p>\n<p>Build brickwall-maker Docker container:</p>\n<pre><code class=\"lang-\">docker build -t brickwall-maker .\n</code></pre>\n<p>Tag the Docker container to prepare it to be pushed to ECR:</p>\n<pre><code class=\"lang-\">docker tag brickwall-maker:latest 123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/brickwall-maker:latest\n</code></pre>\n<p>Push the built Docker container to ECR</p>\n<pre><code class=\"lang-\">docker push 123456789012.dkr.ecr.ap-northeast-1.amazonaws.c\n</code></pre>\n<p><strong>4. Launch the CloudFormation template to deploy your ECS infrastructure</strong></p>\n<p>To deploy your ECS infrastructure, run the following command (replace your own private ECR URL or use our public URL) in the ParameterValue) to launch the CloudFormation template :</p>\n<pre><code class=\"lang-\">aws cloudformation create-stack --stack-name myECS-Stack \\\n--template-body file://DevOpsGuru.yaml \\\n--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM \\\n--parameters ParameterKey=ImageUrl,ParameterValue=public.ecr.aws/p8v8e7e5/myartifacts:brickwallv1\n</code></pre>\n<p><strong>5. Enable DevOps Guru to monitor the ECS Application</strong></p>\n<p>Run the following command to enable DevOps Guru for monitoring your ECS application:</p>\n<pre><code class=\"lang-\">aws cloudformation create-stack \\\n--stack-name EnableDevOpsGuruForCfnStack \\\n--template-body file://EnableDevOpsGuruForCfnStack.yaml \\\n--parameters ParameterKey=CfnStackNames,ParameterValue=myECS-Stack\n</code></pre>\n<ol start=\"6\">\n<li>Wait for <strong>base-lining</strong> of resources</li>\n</ol>\n<p>This step lets DevOps Guru complete the baselining of the resources and benchmark the normal behavior. For this particular scenario, we recommend waiting two days before any insights are triggered.</p>\n<p>Unlike other monitoring tools, the DevOps Guru dashboard would not present any counters or graphs. In the meantime, you can utilize CloudWatch Container Insights to monitor the cluster-level, task-level, and service-level metrics in ECS.</p>\n<p><strong>7. View Container Insights metrics</strong></p>\n<ul>\n<li>Open the <a href=\"https://console.aws.amazon.com/cloudwatch/\" target=\"_blank\">CloudWatch console</a>.</li>\n<li>In the navigation pane, choose <strong>Container Insights</strong>.</li>\n<li>Use the drop-down boxes near the top to select <strong>ECS Services</strong> as the resource type to view, then select <strong>DevOps Guru</strong> as the resource to monitor.</li>\n<li>The performance monitoring view will show you graphs for several metrics, including “Memory Utilization”, which you can watch increasing from here. In addition, it will show the list of tasks in the lower “Task performance” pane showing the “Avg CPU” and “Avg memory” metrics for the individual tasks.</li>\n</ul>\n<p><strong>8. Review DevOps Guru insights</strong></p>\n<p>When DevOps Guru detects an anomaly, it generates a proactive insight with the relevant information needed to investigate the anomaly, and it will list it in the <a href=\"https://console.aws.amazon.com/devops-guru/dashboard\" target=\"_blank\">DevOps Guru Dashboard</a>.</p>\n<p>You can view the insights by clicking on the number of insights displayed in the dashboard. In our case, we expect insights to be shown in the “proactive insights” category on the dashboard.</p>\n<p>Once you have opened the insight, you will see that the insight view is divided into the following sections:</p>\n<ul>\n<li>Insight Overview with a basic description of the anomaly. In this case, stating that Memory Utilization is approaching limit with details of the stack that is being affected by the anomaly.</li>\n<li>Anomalous metrics consisting of related graphs and a timeline of the predicted impact time in the future.</li>\n<li>Relevant events with contextual information, such as changes or updates made to the CloudFormation stack’s resources in the region.</li>\n<li>Recommendations to mitigate the issue. As seen in the following screenshot, it recommends troubleshooting High CPU or Memory Utilization in ECS along with a link to the necessary documentation.</li>\n</ul>\n<p>The following screenshot illustrates an example insight detail page from DevOps Guru</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/7d85a9778fa449d0ad811c06daec976b_image.png\" alt=\"image.png\" /></p>\n<p><img src=\"https://dev-media.amazoncloud.cn/e888b751d51c41b2ac62138fd7eb5979_image.png\" alt=\"image.png\" /></p>\n<h3><a id=\"Conclusion_115\"></a><strong>Conclusion</strong></h3>\n<p>This post describes how DevOps Guru continuously monitors resources in a particular region in your AWS account, as well as proactively helps identify problems around resource exhaustion such as running out of memory, in advance. This helps IT operators take preventative actions even before a problem presents itself, thereby preventing downtime.</p>\n<h4><a id=\"Cleaning_up_118\"></a><strong>Cleaning up</strong></h4>\n<p>After walking through this post, you should clean up and un-provision the resources in order to avoid incurring any further charges.</p>\n<ol>\n<li>To un-provision the CloudFormation stacks, on the AWS CloudFormation console, choose Stacks. Select the <strong>stack</strong> name, and choose <strong>Delete</strong>.<br />\n2.<a href=\"https://docs.aws.amazon.com/cloud9/latest/user-guide/delete-environment.html\" target=\"_blank\"> Delete</a> the AWS Cloud9 environment.</li>\n<li><a href=\"https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-delete.html\" target=\"_blank\">Delete</a> the ECR repository.</li>\n</ol>\n<h4><a id=\"About_the_authors_125\"></a><strong>About the authors</strong></h4>\n<p><img src=\"https://dev-media.amazoncloud.cn/b3d7ee70cd134512b86c32daad2528bf_image.png\" alt=\"image.png\" /></p>\n<p><strong>Trishanka Saikia</strong><br />\nTrishanka Saikia is a Technical Account Manager for AWS. She is also a DevOps enthusiast and works with AWS customers to design, deploy, and manage their AWS workloads/architectures.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/970452f3ed364284a8ea7b3038b15f88_image.png\" alt=\"image.png\" /></p>\n<p><strong>Gerhard Poul</strong><br />\nGerhard Poul is a Senior Solutions Architect at Amazon Web Services based in Vienna, Austria. Gerhard works with customers in Austria to enable them with best practices in their cloud journey. He is passionate about infrastructure as code and how cloud technologies can improve IT operations.</p>\n"}
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭