A new Spark plugin for CPU and memory profiling

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"#### **Introduction**\n\nHave you ever wondered if there are low-hanging optimization opportunities to improve the performance of a Spark app? Profiling can help you gain visibility regarding the runtime characteristics of the Spark app to identify its bottlenecks and inefficiencies. We’re excited to announce the release of a new Spark plugin that enables profiling for JVM based Spark apps via [Amazon CodeGuru](https://aws.amazon.com/codeguru/). The plugin is open sourced on [GitHub](https://github.com/amzn/amazon-codeguru-profiler-for-spark) and published to [Maven](https://search.maven.org/artifact/software.amazon.profiler/codeguru-profiler-for-spark/1.0/bundle).\n\n#### **Walkthrough**\n\nThis post shows how you can onboard this plugin with two steps in under 10 minutes.\n\n- Step 1: Create a profiling group in [Amazon CodeGuru Profiler](https://docs.aws.amazon.com/codeguru/latest/profiler-ug/what-is-codeguru-profiler.html) and grant permission to your Amazon EMR on EC2 role, so that profiler agents can emit metrics to CodeGuru. Detailed instructions can be found [here](https://docs.aws.amazon.com/codeguru/latest/profiler-ug/setting-up-long.html).\n- Step 2: Reference codeguru-profiler-for-spark when submitting your Spark job, along with PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER defined.\n\n#### **Prerequisites**\n\nYour app is built against Spark 3 and run on [Amazon EMR](https://aws.amazon.com/emr/) release 6.x or newer. It doesn’t matter if you’re using Amazon EMR on [Amazon Elastic Compute Cloud (Amazon EC2)](https://aws.amazon.com/ec2/) or on [Amazon Elastic Kubernetes Service (Amazon EKS)](https://aws.amazon.com/eks/).\n\n#### **Illustrative Example**\n\nFor the purposes of illustration, consider the following example where profiling results are collected by the plugin and emitted to the “CodeGuru-Spark-Demo” profiling group.\n\n```\nBash\nspark-submit \\\n--master yarn \\\n--deploy-mode cluster \\\n--class <main-class-in-your-spark-app> \\\n--packages software.amazon.profiler:codeguru-profiler-for-spark:1.0 \\\n--conf spark.plugins=software.amazon.profiler.AmazonProfilerPlugin \\\n--conf spark.executorEnv.PROFILING_CONTEXT=\"{\\\\\\\"profilingGroupName\\\\\\\":\\\\\\\"CodeGuru-Spark-Demo\\\\\\\"}\" \\\n--conf spark.executorEnv.ENABLE_AMAZON_PROFILER=true \\\n--conf spark.dynamicAllocation.enabled=false \\\n<your-spark-app-jar>\n```\n\nAn alternative way to specify PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER is under the ```yarn-env.export``` classification for instance groups in the Amazon EMR web console. Note that PROFILING_CONTEXT, if configured in the web console, must escape all of the commas on top of what’s for the above spark-submit command.\n\n```\nJSON\n[\n {\n \"classification\": \"yarn-env\",\n \"properties\": {},\n \"configurations\": [\n {\n \"classification\": \"export\",\n \"properties\": {\n \"ENABLE_AMAZON_PROFILER\": \"true\",\n \"PROFILING_CONTEXT\": \"{\\\\\\\"profilingGroupName\\\\\\\":\\\\\\\"CodeGuru-Spark-Demo\\\\\\\"\\\\,\\\\\\\"driverEnabled\\\\\\\":\\\\\\\"true\\\\\\\"}\"\n },\n \"configurations\": []\n }\n ]\n }\n]\n```\n\nOnce the job above is launched on Amazon EMR, profiling results should show up in your CodeGuru web console in about 10 minutes, similar to the following screenshot. Internally, it has helped us identify issues, such as thread contentions (revealed by the BLOCKED state in the latency flame graph), and unnecessarily create AWS Java clients (revealed by the CPU Hotspots view).\n\n![DEVOPS_2010_1_B.gif](https://dev-media.amazoncloud.cn/8b31f784232d41f48cb970f2b5cbd065_DEVOPS_2010_1_B.gif)\n\n##### **Troubleshooting**\n\nTo help with troubleshooting, use a sample Spark app provided in the plugin to check if everything is set up correctly. Note that the profilingGroupName value specified in PROFILING_CONTEXT should match what’s created in CodeGuru.\n\n```\nBash\nspark-submit \\\n--master yarn \\\n--deploy-mode cluster \\\n--class software.amazon.profiler.SampleSparkApp \\\n--packages software.amazon.profiler:codeguru-profiler-for-spark:1.0 \\\n--conf spark.plugins=software.amazon.profiler.AmazonProfilerPlugin \\\n--conf spark.executorEnv.PROFILING_CONTEXT=\"{\\\\\\\"profilingGroupName\\\\\\\":\\\\\\\"CodeGuru-Spark-Demo\\\\\\\"}\" \\\n--conf spark.executorEnv.ENABLE_AMAZON_PROFILER=true \\\n--conf spark.yarn.appMasterEnv.PROFILING_CONTEXT=\"{\\\\\\\"profilingGroupName\\\\\\\":\\\\\\\"CodeGuru-Spark-Demo\\\\\\\",\\\\\\\"driverEnabled\\\\\\\":\\\\\\\"true\\\\\\\"}\" \\\n--conf spark.yarn.appMasterEnv.ENABLE_AMAZON_PROFILER=true \\\n--conf spark.dynamicAllocation.enabled=false \\\n/usr/lib/hadoop-yarn/hadoop-yarn-server-tests.jar\n```\n\nRunning the command above from the master node of your EMR cluster should produce logs similar to the following:\n\n```\nLog\n21/11/21 21:27:21 INFO Profiler: Starting the profiler : ProfilerParameters{profilingGroupName='CodeGuru-Spark-Demo', threadSupport=BasicThreadSupport (default), excludedThreads=[Signal Dispatcher, Attach Listener], shouldProfile=true, integrationMode='', memoryUsageLimit=104857600, heapSummaryEnabled=true, stackDepthLimit=1000, samplingInterval=PT1S, reportingInterval=PT5M, addProfilerOverheadAsSamples=true, minimumTimeForReporting=PT1M, dontReportIfSampledLessThanTimes=1}\n21/11/21 21:27:21 INFO ProfilingCommandExecutor: Profiling scheduled, sampling rate is PT1S\n...\n21/11/21 21:27:23 INFO ProfilingCommand: New agent configuration received : AgentConfiguration(AgentParameters={MaxStackDepth=1000, MinimumTimeForReportingInMilliseconds=60000, SamplingIntervalInMilliseconds=1000, MemoryUsageLimitPercent=10, ReportingIntervalInMilliseconds=300000}, PeriodInSeconds=300, ShouldProfile=true)\n21/11/21 21:32:23 INFO ProfilingCommand: Attempting to report profile data: start=2021-11-21T21:27:23.227Z end=2021-11-21T21:32:22.765Z force=false memoryRefresh=false numberOfTimesSampled=300\n21/11/21 21:32:23 INFO javaClass: [HeapSummary] Processed 20 events.\n21/11/21 21:32:24 INFO ProfilingCommand: Successfully reported profile\n```\n\nNote that the CodeGuru Profiler agent uses a reporting interval of five minutes. Therefore, any executor process shorter than five minutes won’t be reflected by the profiling result. If the right profiling group is not specified, or it’s associated with a wrong EC2 role in CodeGuru, then the log will show a message similar to “CodeGuruProfilerSDKClient: Exception while calling agent orchestration” along with a stack trace including a 403 status code. To rule out any network issues (e.g., your EMR job running in a VPC without an outbound gateway or a misconfigured outbound security group), then you can remote into an EMR host and ping the [CodeGuru endpoint](https://docs.aws.amazon.com/general/latest/gr/codeguru-profiler.html) in your Region (e.g., ping codeguru-profiler.us-east-1.amazonaws.com).\n\n#### **Cleaning up**\n\nTo avoid incurring future charges, you can delete the profiling group configured in CodeGuru and/or set the ENABLE_AMAZON_PROFILER environment variable to false.\n\n#### **Conclusion**\n\nIn this post, we describe how to onboard this plugin with two steps. Consider to give it a try for your Spark app? You can find the Maven artifacts [here](https://search.maven.org/artifact/software.amazon.profiler/codeguru-profiler-for-spark/1.0/bundle). If you have feature requests, bug reports, feedback of any kind, or would like to contribute, please head over to the [GitHub repository](https://github.com/amzn/amazon-codeguru-profiler-for-spark).\n\n##### **Author:**\n\n![image.png](https://dev-media.amazoncloud.cn/aaeea162cf394f6eb4428a2a23a82ae1_image.png)\n\n##### **Bo Xiong**\nBo Xiong is a software engineer with Amazon Ads, leveraging big data technologies to process petabytes of data for billing and reporting. His main interests include performance tuning and optimization for Spark on Amazon EMR, and data mining for actionable business insights.\n\nTAGS: [CodeGuru](https://aws.amazon.com/blogs/devops/tag/codeguru/), [EMR](https://aws.amazon.com/blogs/devops/tag/emr/), [profiling](https://aws.amazon.com/blogs/devops/tag/profiling/), [Spark](https://aws.amazon.com/blogs/devops/tag/spark/)","render":"<h4><a id=\"Introduction_0\"></a><strong>Introduction</strong></h4>\n<p>Have you ever wondered if there are low-hanging optimization opportunities to improve the performance of a Spark app? Profiling can help you gain visibility regarding the runtime characteristics of the Spark app to identify its bottlenecks and inefficiencies. We’re excited to announce the release of a new Spark plugin that enables profiling for JVM based Spark apps via <a href=\"https://aws.amazon.com/codeguru/\" target=\"_blank\">Amazon CodeGuru</a>. The plugin is open sourced on <a href=\"https://github.com/amzn/amazon-codeguru-profiler-for-spark\" target=\"_blank\">GitHub</a> and published to <a href=\"https://search.maven.org/artifact/software.amazon.profiler/codeguru-profiler-for-spark/1.0/bundle\" target=\"_blank\">Maven</a>.</p>\n<h4><a id=\"Walkthrough_4\"></a><strong>Walkthrough</strong></h4>\n<p>This post shows how you can onboard this plugin with two steps in under 10 minutes.</p>\n<ul>\n<li>Step 1: Create a profiling group in <a href=\"https://docs.aws.amazon.com/codeguru/latest/profiler-ug/what-is-codeguru-profiler.html\" target=\"_blank\">Amazon CodeGuru Profiler</a> and grant permission to your Amazon EMR on EC2 role, so that profiler agents can emit metrics to CodeGuru. Detailed instructions can be found <a href=\"https://docs.aws.amazon.com/codeguru/latest/profiler-ug/setting-up-long.html\" target=\"_blank\">here</a>.</li>\n<li>Step 2: Reference codeguru-profiler-for-spark when submitting your Spark job, along with PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER defined.</li>\n</ul>\n<h4><a id=\"Prerequisites_11\"></a><strong>Prerequisites</strong></h4>\n<p>Your app is built against Spark 3 and run on <a href=\"https://aws.amazon.com/emr/\" target=\"_blank\">Amazon EMR</a> release 6.x or newer. It doesn’t matter if you’re using Amazon EMR on <a href=\"https://aws.amazon.com/ec2/\" target=\"_blank\">Amazon Elastic Compute Cloud (Amazon EC2)</a> or on <a href=\"https://aws.amazon.com/eks/\" target=\"_blank\">Amazon Elastic Kubernetes Service (Amazon EKS)</a>.</p>\n<h4><a id=\"Illustrative_Example_15\"></a><strong>Illustrative Example</strong></h4>\n<p>For the purposes of illustration, consider the following example where profiling results are collected by the plugin and emitted to the “CodeGuru-Spark-Demo” profiling group.</p>\n<pre><code class=\"lang-\">Bash\nspark-submit \\\n--master yarn \\\n--deploy-mode cluster \\\n--class <main-class-in-your-spark-app> \\\n--packages software.amazon.profiler:codeguru-profiler-for-spark:1.0 \\\n--conf spark.plugins=software.amazon.profiler.AmazonProfilerPlugin \\\n--conf spark.executorEnv.PROFILING_CONTEXT="{\\\\\\"profilingGroupName\\\\\\":\\\\\\"CodeGuru-Spark-Demo\\\\\\"}" \\\n--conf spark.executorEnv.ENABLE_AMAZON_PROFILER=true \\\n--conf spark.dynamicAllocation.enabled=false \\\n<your-spark-app-jar>\n</code></pre>\n<p>An alternative way to specify PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER is under the <code>yarn-env.export</code> classification for instance groups in the Amazon EMR web console. Note that PROFILING_CONTEXT, if configured in the web console, must escape all of the commas on top of what’s for the above spark-submit command.</p>\n<pre><code class=\"lang-\">JSON\n[\n {\n "classification": "yarn-env",\n "properties": {},\n "configurations": [\n {\n "classification": "export",\n "properties": {\n "ENABLE_AMAZON_PROFILER": "true",\n "PROFILING_CONTEXT": "{\\\\\\"profilingGroupName\\\\\\":\\\\\\"CodeGuru-Spark-Demo\\\\\\"\\\\,\\\\\\"driverEnabled\\\\\\":\\\\\\"true\\\\\\"}"\n },\n "configurations": []\n }\n ]\n }\n]\n</code></pre>\n<p>Once the job above is launched on Amazon EMR, profiling results should show up in your CodeGuru web console in about 10 minutes, similar to the following screenshot. Internally, it has helped us identify issues, such as thread contentions (revealed by the BLOCKED state in the latency flame graph), and unnecessarily create AWS Java clients (revealed by the CPU Hotspots view).</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/8b31f784232d41f48cb970f2b5cbd065_DEVOPS_2010_1_B.gif\" alt=\"DEVOPS_2010_1_B.gif\" /></p>\n<h5><a id=\"Troubleshooting_59\"></a><strong>Troubleshooting</strong></h5>\n<p>To help with troubleshooting, use a sample Spark app provided in the plugin to check if everything is set up correctly. Note that the profilingGroupName value specified in PROFILING_CONTEXT should match what’s created in CodeGuru.</p>\n<pre><code class=\"lang-\">Bash\nspark-submit \\\n--master yarn \\\n--deploy-mode cluster \\\n--class software.amazon.profiler.SampleSparkApp \\\n--packages software.amazon.profiler:codeguru-profiler-for-spark:1.0 \\\n--conf spark.plugins=software.amazon.profiler.AmazonProfilerPlugin \\\n--conf spark.executorEnv.PROFILING_CONTEXT="{\\\\\\"profilingGroupName\\\\\\":\\\\\\"CodeGuru-Spark-Demo\\\\\\"}" \\\n--conf spark.executorEnv.ENABLE_AMAZON_PROFILER=true \\\n--conf spark.yarn.appMasterEnv.PROFILING_CONTEXT="{\\\\\\"profilingGroupName\\\\\\":\\\\\\"CodeGuru-Spark-Demo\\\\\\",\\\\\\"driverEnabled\\\\\\":\\\\\\"true\\\\\\"}" \\\n--conf spark.yarn.appMasterEnv.ENABLE_AMAZON_PROFILER=true \\\n--conf spark.dynamicAllocation.enabled=false \\\n/usr/lib/hadoop-yarn/hadoop-yarn-server-tests.jar\n</code></pre>\n<p>Running the command above from the master node of your EMR cluster should produce logs similar to the following:</p>\n<pre><code class=\"lang-\">Log\n21/11/21 21:27:21 INFO Profiler: Starting the profiler : ProfilerParameters{profilingGroupName='CodeGuru-Spark-Demo', threadSupport=BasicThreadSupport (default), excludedThreads=[Signal Dispatcher, Attach Listener], shouldProfile=true, integrationMode='', memoryUsageLimit=104857600, heapSummaryEnabled=true, stackDepthLimit=1000, samplingInterval=PT1S, reportingInterval=PT5M, addProfilerOverheadAsSamples=true, minimumTimeForReporting=PT1M, dontReportIfSampledLessThanTimes=1}\n21/11/21 21:27:21 INFO ProfilingCommandExecutor: Profiling scheduled, sampling rate is PT1S\n...\n21/11/21 21:27:23 INFO ProfilingCommand: New agent configuration received : AgentConfiguration(AgentParameters={MaxStackDepth=1000, MinimumTimeForReportingInMilliseconds=60000, SamplingIntervalInMilliseconds=1000, MemoryUsageLimitPercent=10, ReportingIntervalInMilliseconds=300000}, PeriodInSeconds=300, ShouldProfile=true)\n21/11/21 21:32:23 INFO ProfilingCommand: Attempting to report profile data: start=2021-11-21T21:27:23.227Z end=2021-11-21T21:32:22.765Z force=false memoryRefresh=false numberOfTimesSampled=300\n21/11/21 21:32:23 INFO javaClass: [HeapSummary] Processed 20 events.\n21/11/21 21:32:24 INFO ProfilingCommand: Successfully reported profile\n</code></pre>\n<p>Note that the CodeGuru Profiler agent uses a reporting interval of five minutes. Therefore, any executor process shorter than five minutes won’t be reflected by the profiling result. If the right profiling group is not specified, or it’s associated with a wrong EC2 role in CodeGuru, then the log will show a message similar to “CodeGuruProfilerSDKClient: Exception while calling agent orchestration” along with a stack trace including a 403 status code. To rule out any network issues (e.g., your EMR job running in a VPC without an outbound gateway or a misconfigured outbound security group), then you can remote into an EMR host and ping the <a href=\"https://docs.aws.amazon.com/general/latest/gr/codeguru-profiler.html\" target=\"_blank\">CodeGuru endpoint</a> in your Region (e.g., ping codeguru-profiler.us-east-1.amazonaws.com).</p>\n<h4><a id=\"Cleaning_up_94\"></a><strong>Cleaning up</strong></h4>\n<p>To avoid incurring future charges, you can delete the profiling group configured in CodeGuru and/or set the ENABLE_AMAZON_PROFILER environment variable to false.</p>\n<h4><a id=\"Conclusion_98\"></a><strong>Conclusion</strong></h4>\n<p>In this post, we describe how to onboard this plugin with two steps. Consider to give it a try for your Spark app? You can find the Maven artifacts <a href=\"https://search.maven.org/artifact/software.amazon.profiler/codeguru-profiler-for-spark/1.0/bundle\" target=\"_blank\">here</a>. If you have feature requests, bug reports, feedback of any kind, or would like to contribute, please head over to the <a href=\"https://github.com/amzn/amazon-codeguru-profiler-for-spark\" target=\"_blank\">GitHub repository</a>.</p>\n<h5><a id=\"Author_102\"></a><strong>Author:</strong></h5>\n<p><img src=\"https://dev-media.amazoncloud.cn/aaeea162cf394f6eb4428a2a23a82ae1_image.png\" alt=\"image.png\" /></p>\n<h5><a id=\"Bo_Xiong_106\"></a><strong>Bo Xiong</strong></h5>\n<p>Bo Xiong is a software engineer with Amazon Ads, leveraging big data technologies to process petabytes of data for billing and reporting. His main interests include performance tuning and optimization for Spark on Amazon EMR, and data mining for actionable business insights.</p>\n<p>TAGS: <a href=\"https://aws.amazon.com/blogs/devops/tag/codeguru/\" target=\"_blank\">CodeGuru</a>, <a href=\"https://aws.amazon.com/blogs/devops/tag/emr/\" target=\"_blank\">EMR</a>, <a href=\"https://aws.amazon.com/blogs/devops/tag/profiling/\" target=\"_blank\">profiling</a>, <a href=\"https://aws.amazon.com/blogs/devops/tag/spark/\" target=\"_blank\">Spark</a></p>\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家