Identify mangrove forests using satellite image features using Amazon SageMaker Studio and Amazon SageMaker Autopilot – Part 2

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"Mangrove forests are an important part of a healthy ecosystem, and human activities are one of the major reasons for their gradual disappearance from coastlines around the world. Using a machine learning (ML) model to identify mangrove regions from a satellite image gives researchers an effective way to monitor the size of the forests over time. In [Part 1](https://aws.amazon.com/blogs/machine-learning/part-1-identify-mangrove-forests-using-satellite-image-features-using-amazon-sagemaker-studio-and-amazon-sagemaker-autopilot/) of this series, we showed how to gather satellite data in an automated fashion and analyze it in [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) with interactive visualization. In this post, we show how to use [Amazon SageMaker Autopilot](https://aws.amazon.com/sagemaker/autopilot/) to automate the process of building a custom mangrove classifier.\n\n### **Train a model with Autopilot**\n\nAutopilot provides a balanced way of building several models and selecting the best one. While creating multiple combinations of different data preprocessing techniques and ML models with minimal effort, Autopilot provides complete control over these component steps to the data scientist, if desired.\n\nYou can use Autopilot using one of the AWS SDKs (details available in the [API reference guide for Autopilot](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-reference.html)) or through Studio. We use Autopilot in our Studio solution following the steps outlined in this section:\n\n1. On the Studio Launcher page, choose the plus sign for **New Autopilot experiment**.\n\n![image.png](https://dev-media.amazoncloud.cn/90e3c8361e72463f974d6ca2771e1c7b_image.png)\n\n2. **For Connect** your data, select **Find S3 bucket**, and enter the bucket name where you kept the training and test datasets.\n3. For **Dataset file name**, enter the name of the training data file you created in the **Prepare the training data** section in [Part 1](https://aws.amazon.com/blogs/machine-learning/part-1-identify-mangrove-forests-using-satellite-image-features-using-amazon-sagemaker-studio-and-amazon-sagemaker-autopilot/).\n4. For **Output data location (S3 bucket)**, enter the same bucket name you used in step 2.\n5. For **Dataset directory name**, enter a folder name under the bucket where you want Autopilot to store artifacts.\n6. For **Is your S3 input a manifest file?**, choose **Off**.\n7. For **Target**, choose **label**.\n8. For **Auto deploy**, choose **Off**.\n\n![image.png](https://dev-media.amazoncloud.cn/30f77f9804ba40f083b126ccdb44d2e3_image.png)\n\n9. Under the **Advanced settings**, for **Machine learning problem type**, choose** Binary Classification**.\n10. For **Objective metric**, choose **AUC**.\n11. For **Choose how to run your experiment**, choose **No**, **run a pilot to create a notebook with candidate definitions**.\n12. Choose **Create Experiment**.\n\n![image.png](https://dev-media.amazoncloud.cn/000324fe10d34a2f91711cb45d27b5c7_image.png)\n\nFor more information about creating an experiment, refer to [Create an Amazon SageMaker Autopilot experiment](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development-create-experiment.html).It may take about 15 minutes to run this step.\n\n13. When complete, choose **Open candidate generation notebook**, which opens a new notebook in read-only mode.\n\n![image.png](https://dev-media.amazoncloud.cn/d086a6a5d0bc4840807f228200743f9b_image.png)\n\n14. Choose I**mport notebook** to make the notebook editable.\n\n![image.png](https://dev-media.amazoncloud.cn/b6d0ea01dc234a899c8d4592dde1fa9a_image.png)\n\n15. For Image, choose **Data Science**.\n16. For **Kernel**, choose **Python 3**.\n17. Choose **Select**.\n\n![image.png](https://dev-media.amazoncloud.cn/db1bf016ae2e4a0e8ce28cba38bfe2de_image.png)\n\nThis auto-generated notebook has detailed explanations and provides complete control over the actual model building task to follow. A customized version of the [notebook](https://github.com/aws-samples/mangrove-landcover-classification/blob/e3f501d99f735ae815552dd168d80ab592d86979/notebooks/mangrove-2013.ipynb), where a classifier is trained using Landsat satellite bands from 2013, is available in the code repository under ```notebooks/mangrove-2013.ipynb```.\n\nThe model building framework consists of two parts: feature transformation as part of the data processing step, and hyperparameter optimization (HPO) as part of the model selection step. All the necessary artifacts for these tasks were created during the Autopilot experiment and saved in [Amazon Simple Storage Service](http://aws.amazon.com/s3) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)). The first notebook cell downloads those artifacts from [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) to the local [Amazon SageMaker](https://aws.amazon.com/sagemaker/) file system for inspection and any necessary modification. There are two folders: ```generated_module``` and ```sagemaker_automl```, where all the Python modules and scripts necessary to run the notebook are stored. The various feature transformation steps like imputation, scaling, and PCA are saved as ```generated_modules/candidate_data_processors/dpp*.py```.\n\nAutopilot creates three different models based on the XGBoost, linear learner, and multi-layer perceptron (MLP) algorithms. A candidate pipeline consists of one of the feature transformations options, known as ```data_transformer```, and an algorithm. A pipeline is a Python dictionary and can be defined as follows:\n\n```\\ncandidate1 = {\\n \\"data_transformer\\": {\\n \\"name\\": \\"dpp5\\",\\n \\"training_resource_config\\": {\\n \\"instance_type\\": \\"ml.m5.4xlarge\\",\\n \\"instance_count\\": 1,\\n \\"volume_size_in_gb\\": 50\\n },\\n \\"transform_resource_config\\": {\\n \\"instance_type\\": \\"ml.m5.4xlarge\\",\\n \\"instance_count\\": 1,\\n },\\n \\"transforms_label\\": True,\\n \\"transformed_data_format\\": \\"application/x-recordio-protobuf\\",\\n \\"sparse_encoding\\": True\\n },\\n \\"algorithm\\": {\\n \\"name\\": \\"xgboost\\",\\n \\"training_resource_config\\": {\\n \\"instance_type\\": \\"ml.m5.4xlarge\\",\\n \\"instance_count\\": 1,\\n },\\n }\\n}\\n```\n\nIn this example, the pipeline transforms the training data according to the script in ```generated_modules/candidate_data_processors/dpp5.py``` and builds an XGBoost model. This is where Autopilot provides complete control to the data scientist, who can pick the automatically generated feature transformation and model selection steps or build their own combination.\n\nYou can now add the pipeline to a pool for Autopilot to run the experiment as follows:\n\n```\\nfrom sagemaker_automl import AutoMLInteractiveRunner, AutoMLLocalCandidate\\n\\nautoml_interactive_runner = AutoMLInteractiveRunner(AUTOML_LOCAL_RUN_CONFIG)\\nautoml_interactive_runner.select_candidate(candidate1)\\n```\n\nThis is an important step where you can decide to keep only a subset of candidates suggested by Autopilot, based on subject matter expertise, to reduce the total runtime. For now, keep all Autopilot suggestions, which you can list as follows:\n\n```\\nautoml_interactive_runner.display_candidates()\\n```\n\n![image.png](https://dev-media.amazoncloud.cn/1c703dadb84941d08800455bf49716dd_image.png)\n\nThe full Autopilot experiment is done in two parts. First, you need to run the data transformation jobs:\n\n```\\nautoml_interactive_runner.fit_data_transformers(parallel_jobs=7)\\n```\n\nThis step should complete in about 30 minutes for all the candidates, if you make no further modifications to the ```dpp*.py``` files.\n\nThe next step is to build the best set of models by tuning the hyperparameters for the respective algorithms. The hyperparameters are usually divided into two parts: static and tunable. The static hyperparameters remain unchanged throughout the experiment for all candidates that share the same algorithm. These hyperparameters are passed to the experiment as a dictionary. If you choose to pick the best XGBoost model by maximizing AUC from three rounds of a five-fold cross-validation scheme, the dictionary looks like the following code:\n\n```\\n{\\n 'objective': 'binary:logistic',\\n 'eval_metric': 'auc',\\n '_kfold': 5,\\n '_num_cv_round': 3,\\n} \\n```\n\nFor the tunable hyperparameters, you need to pass another dictionary with ranges and scaling type:\n\n```\\n{\\n 'num_round': IntegerParameter(64, 1024, scaling_type='Logarithmic'),\\n 'max_depth': IntegerParameter(2, 8, scaling_type='Logarithmic'),\\n 'eta': ContinuousParameter(1e-3, 1.0, scaling_type='Logarithmic'),\\n... \\n}\\n```\n\nThe complete set of hyperparameters is available in the ```mangrove-2013.ipynb``` [notebook](https://github.com/aws-samples/mangrove-landcover-classification/blob/e3f501d99f735ae815552dd168d80ab592d86979/notebooks/mangrove-2013.ipynb).\n\nTo create an experiment where all seven candidates can be tested in parallel, create a multi-algorithm HPO tuner:\n\n```\\nmulti_algo_tuning_parameters = automl_interactive_runner.prepare_multi_algo_parameters(\\n objective_metrics=ALGORITHM_OBJECTIVE_METRICS,\\n static_hyperparameters=STATIC_HYPERPARAMETERS,\\n hyperparameters_search_ranges=ALGORITHM_TUNABLE_HYPERPARAMETER_RANGES)\\n```\n\nThe objective metrics are defined independently for each algorithm:\n\n```\\nALGORITHM_OBJECTIVE_METRICS = {\\n 'xgboost': 'validation:auc',\\n 'linear-learner': 'validation:roc_auc_score',\\n 'mlp': 'validation:roc_auc',\\n}\\n```\n\nTrying all possible values of hyperparameters for all the experiments is wasteful; you can adopt a Bayesian strategy to create an HPO tuner:\n\n```\\nmulti_algo_tuning_inputs = automl_interactive_runner.prepare_multi_algo_inputs()\\nase_tuning_job_name = \\"{}-tuning\\".format(AUTOML_LOCAL_RUN_CONFIG.local_automl_job_name)\\n\\ntuner = HyperparameterTuner.create(\\n base_tuning_job_name=base_tuning_job_name,\\n strategy='Bayesian',\\n objective_type='Maximize',\\n max_parallel_jobs=10,\\n max_jobs=50,\\n **multi_algo_tuning_parameters,\\n)\\n```\n\nIn the default setting, Autopilot picks 250 jobs in the tuner to pick the best model. For this use case, it’s sufficient to set ```max_jobs=50``` to save time and resources, without any significant penalty in terms of picking the best set of hyperparameters. Finally, submit the HPO job as follows:\n\n```\\ntuner.fit(inputs=multi_algo_tuning_inputs, include_cls_metadata=None)\\n```\n\nThe process takes about 80 minutes on ml.m5.4xlarge instances. You can monitor progress on the SageMaker console by choosing **Hyperparameter tuning jobs** under **Training **in the navigation pane.\n\n![image.png](https://dev-media.amazoncloud.cn/0f89d790a8024ee3be53f1cbb355ee3c_image.png)\n\nYou can visualize a host of useful information, including the performance of each candidate, by choosing the name of the job in progress.\n\n![image.png](https://dev-media.amazoncloud.cn/b7d1f64c1f8e467682880f0b583007c0_image.png)\n\nFinally, compare the model performance of the best candidates as follows:\n\n```\\nfrom sagemaker.analytics import HyperparameterTuningJobAnalytics\\n\\nSAGEMAKER_SESSION = AUTOML_LOCAL_RUN_CONFIG.sagemaker_session\\nSAGEMAKER_ROLE = AUTOML_LOCAL_RUN_CONFIG.role\\n\\ntuner_analytics = HyperparameterTuningJobAnalytics(\\n tuner.latest_tuning_job.name, sagemaker_session=SAGEMAKER_SESSION)\\n\\ndf_tuning_job_analytics = tuner_analytics.dataframe()\\n\\ndf_tuning_job_analytics.sort_values(\\n by=['FinalObjectiveValue'],\\n inplace=True,\\n ascending=False if tuner.objective_type == \\"Maximize\\" else True)\\n\\n# select the columns to display and rename\\nselect_columns = [\\"TrainingJobDefinitionName\\", \\"FinalObjectiveValue\\", \\"TrainingElapsedTimeSeconds\\"]\\nrename_columns = {\\n\\t\\"TrainingJobDefinitionName\\": \\"candidate\\",\\n\\t\\"FinalObjectiveValue\\": \\"AUC\\",\\n\\t\\"TrainingElapsedTimeSeconds\\": \\"run_time\\" \\n}\\n\\n# Show top 5 model performances\\ndf_tuning_job_analytics.rename(columns=rename_columns)[rename_columns.values()].set_index(\\"candidate\\").head(5)\\n```\n\n![image.png](https://dev-media.amazoncloud.cn/0143c7b2e31d462fa3e42610f7922ec6_image.png)\n\nThe top performing model based on MLP, while marginally better than the XGBoost models with various choices of data processing steps, also takes a lot longer to train. You can find important details about the MLP model training, including the combination of hyperparameters used, as follows:\n\n```\\ndf_tuning_job_analytics.loc[df_tuning_job_analytics.TrainingJobName==best_training_job].T.dropna() \\n```\n\n![image.png](https://dev-media.amazoncloud.cn/56bc98105b22436793b6c03752571fe8_image.png)\n\n### **Create an inference pipeline**\n\nTo generate inference on new data, you have to construct an inference pipeline on SageMaker to host the best model that can be called later to generate inference. The SageMaker pipeline model requires three containers as its components: data transformation, algorithm, and inverse label transformation (if numerical predictions need to be mapped on to non-numerical labels). For brevity, only part of the required code is shown in the following snippet; the complete code is available in the ```mangrove-2013.ipynb``` [notebook](https://github.com/aws-samples/mangrove-landcover-classification/blob/e3f501d99f735ae815552dd168d80ab592d86979/notebooks/mangrove-2013.ipynb):\n\n```\\nfrom sagemaker.estimator import Estimator\\nfrom sagemaker import PipelineModel\\nfrom sagemaker_automl import select_inference_output\\n\\n…\\n# Final pipeline model \\nmodel_containers = [best_data_transformer_model, best_algo_model]\\nif best_candidate.transforms_label:\\n\\tmodel_containers.append(best_candidate.get_data_transformer_model(\\n \\ttransform_mode=\\"inverse-label-transform\\",\\n \\trole=SAGEMAKER_ROLE,\\n \\tsagemaker_session=SAGEMAKER_SESSION))\\n\\n# select the output type\\nmodel_containers = select_inference_output(\\"BinaryClassification\\", model_containers, output_keys=['predicted_label'])\\n```\n\nAfter the model containers are built, you can construct and deploy the pipeline as follows:\n\n```\\nfrom sagemaker import PipelineModel\\n\\npipeline_model = PipelineModel(\\n\\tname=f\\"mangrove-automl-2013\\",\\n\\trole=SAGEMAKER_ROLE,\\n\\tmodels=model_containers,\\n\\tvpc_config=AUTOML_LOCAL_RUN_CONFIG.vpc_config)\\n\\npipeline_model.deploy(initial_instance_count=1,\\n \\tinstance_type='ml.m5.2xlarge',\\n \\tendpoint_name=pipeline_model.name,\\n \\twait=True)\\n```\n\nThe endpoint deployment takes about 10 minutes to complete.\n\n### **Get inference on the test dataset using an endpoint**\n\nAfter the endpoint is deployed, you can invoke it with a payload of features B1–B7 to classify each pixel in an image as either mangrove (1) or other (0):\n\n```\\nimport boto3\\nsm_runtime = boto3.client('runtime.sagemaker')\\n\\npred_labels = []\\nwith open(local_download, 'r') as f:\\n for i, row in enumerate(f):\\n payload = row.rstrip('\\\\n')\\n x = sm_runtime.invoke_endpoint(EndpointName=inf_endpt,\\n \\tContentType=\\"text/csv\\",\\n \\tBody=payload)\\n pred_labels.append(int(x['Body'].read().decode().strip()))\\n```\n\nComplete details on postprocessing the model predictions for evaluation and plotting are available in ```notebooks/model_performance.ipynb```.\n\n### **Get inference on the test dataset using a batch transform**\n\nNow that you have created the best-performing model with Autopilot, we can use the model for inference. To get inference on large datasets, it’s more efficient to use a batch transform. Let’s generate predictions on the entire dataset (training and test) and append the results to the features, so that we can perform further analysis to, for instance, check the predicted vs. actuals and the distribution of features amongst predicted classes.\n\nFirst, we create a manifest file in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) that points to the locations of the training and test data from the previous data processing steps:\n\n```\\nimport boto3\\ndata_bucket = <Name of the S3 bucket that has the training data>\\nprefix = \\"LANDSAT_LC08_C01_T1_SR/Year2013\\"\\nmanifest = \\"[{{\\\\\\"prefix\\\\\\": \\\\\\"s3://{}/{}/\\\\\\"}},\\\\n\\\\\\"train.csv\\\\\\",\\\\n\\\\\\"test.csv\\\\\\"\\\\n]\\".format(data_bucket, prefix)\\ns3_client = boto3.client('s3')\\ns3_client.put_object(Body=manifest, Bucket=data_bucket, Key=f\\"{prefix}/data.manifest\\")\\n```\n\nNow we can create a batch transform job. Because our input train and test dataset have ```label``` as the last column, we need to drop it during inference. To do that, we pass ```InputFilter``` in the ```DataProcessing``` argument. The code ```\\"\$[:-2]\\"``` indicates to drop the last column. The predicted output is then joined with the source data for further analysis.\n\nIn the following code, we construct the arguments for the batch transform job and then pass to the ```create_transform_job``` function:\n\n```\\nfrom time import gmtime, strftime\\n\\nbatch_job_name = \\"Batch-Transform-\\" + strftime(\\"%Y-%m-%d-%H-%M-%S\\", gmtime())\\noutput_location = \\"s3://{}/{}/batch_output/{}\\".format(data_bucket, prefix, batch_job_name)\\ninput_location = \\"s3://{}/{}/data.manifest\\".format(data_bucket, prefix)\\n\\nrequest = {\\n \\"TransformJobName\\": batch_job_name,\\n \\"ModelName\\": pipeline_model.name,\\n \\"TransformOutput\\": {\\n \\"S3OutputPath\\": output_location,\\n \\"Accept\\": \\"text/csv\\",\\n \\"AssembleWith\\": \\"Line\\",\\n },\\n \\"TransformInput\\": {\\n \\"DataSource\\": {\\"S3DataSource\\": {\\"S3DataType\\": \\"ManifestFile\\", \\"S3Uri\\": input_location}},\\n \\"ContentType\\": \\"text/csv\\",\\n \\"SplitType\\": \\"Line\\",\\n \\"CompressionType\\": \\"None\\",\\n },\\n \\"TransformResources\\": {\\"InstanceType\\": \\"ml.m4.xlarge\\", \\"InstanceCount\\": 1},\\n \\"DataProcessing\\": {\\"InputFilter\\": \\"\$[:-2]\\", \\"JoinSource\\": \\"Input\\"}\\n}\\n\\nsagemaker = boto3.client(\\"sagemaker\\")\\nsagemaker.create_transform_job(**request)\\nprint(\\"Created Transform job with name: \\", batch_job_name)\\n```\n\nYou can monitor the status of the job on the SageMaker console.\n\n![image.png](https://dev-media.amazoncloud.cn/3efb5132327a45ffb711a55c02e4f300_image.png)\n\n### **Visualize model performance**\n\nYou can now visualize the performance of the best model on the test dataset, consisting of regions from India, Myanmar, Cuba, and Vietnam, as a confusion matrix. The model has a high recall value for pixels representing mangroves, but only about 75% precision. The precision of non-mangrove or other pixels stand at 99% with an 85% recall. You can tune the probability cutoff of the model predictions to adjust the respective values depending on the particular use case.\n\n![image.png](https://dev-media.amazoncloud.cn/3db091bc54cd4c5d80274f0f3be4fec6_image.png)\n\n![image.png](https://dev-media.amazoncloud.cn/ef2fe54f1372414ab0540cc2a0949484_image.png)\n\nIt’s worth noting that the results are a significant improvement over the built-in smileCart model.\n\n### **Visualize model predictions**\n\nFinally, it’s useful to observe the model performance on specific regions on the map. In the following image, the mangrove area in the India-Bangladesh border is depicted in red. Points sampled from the Landsat image patch belonging to the test dataset are superimposed on the region, where each point is a pixel that the model determines to be representing mangroves. The blue points are classified correctly by the model, whereas the black points represent mistakes by the model.\n\n![image.png](https://dev-media.amazoncloud.cn/febbdda2fb254c75893cbb18b3ff9c01_image.png)\n\nThe following image shows only the points that the model predicted to not represent mangroves, with the same color scheme as the preceding example. The gray outline is the part of the Landsat patch that doesn’t include any mangroves. As is evident from the image, the model doesn’t make any mistake classifying points on water, but faces a challenge when distinguishing pixels representing mangroves from those representing regular foliage.\n\n![image.png](https://dev-media.amazoncloud.cn/eadd27e99492461a841c6f8883252e39_image.png)\n\nThe following image shows model performance on the Myanmar mangrove region.\n\n![image.png](https://dev-media.amazoncloud.cn/e873879ab90646758e620ec76753221c_image.png)\n\nIn the following image, the model does a better job identifying mangrove pixels.\n\n![image.png](https://dev-media.amazoncloud.cn/4b571a7b558c49f98c59ebbd7f0aa5e1_image.png)\n\n### **Clean up**\n\nThe SageMaker inference endpoint continues to incur cost if left running. Delete the endpoint as follows when you’re done:\n\n```\\nsagemaker.delete_endpoint(EndpointName=pipeline_model.name)\\n```\n\n### **Conclusion**\n\nThis series of posts provided an end-to-end framework for data scientists for solving GIS problems. [Part 1](https://aws.amazon.com/blogs/machine-learning/part-1-identify-mangrove-forests-using-satellite-image-features-using-amazon-sagemaker-studio-and-amazon-sagemaker-autopilot/) showed the ETL process and a convenient way to visually interact with the data. Part 2 showed how to use Autopilot to automate building a custom mangrove classifier.\n\nYou can use this framework to explore new satellite datasets containing a richer set of bands useful for mangrove classification and explore feature engineering by incorporating domain knowledge.\n\n### **About the Authors**\n\n![image.png](https://dev-media.amazoncloud.cn/fc334ff00f9f490dadcb7e0d1e151882_image.png)\n\n**Andrei Ivanovic** is an incoming Master’s of Computer Science student at the University of Toronto and a recent graduate of the Engineering Science program at the University of Toronto, majoring in Machine Intelligence with a Robotics/Mechatronics minor. He is interested in computer vision, deep learning, and robotics. He did the work presented in this post during his summer internship at Amazon.\n\n![image.png](https://dev-media.amazoncloud.cn/417517f990584edc9aafbb44c06f3e75_image.png)\n\n**David Dong** is a Data Scientist at Amazon Web Services.\n\n![image.png](https://dev-media.amazoncloud.cn/b1f1f318b2364153baaa488156fe4a32_image.png)\n\n**Arkajyoti Misra** is a Data Scientist at Amazon LastMile Transportation. He is passionate about applying Computer Vision techniques to solve problems that helps the earth. He loves to work with non-profit organizations and is a founding member of [ekipi.org](http://ekipi.org/).","render":"<p>Mangrove forests are an important part of a healthy ecosystem, and human activities are one of the major reasons for their gradual disappearance from coastlines around the world. Using a machine learning (ML) model to identify mangrove regions from a satellite image gives researchers an effective way to monitor the size of the forests over time. In <a href=\\"https://aws.amazon.com/blogs/machine-learning/part-1-identify-mangrove-forests-using-satellite-image-features-using-amazon-sagemaker-studio-and-amazon-sagemaker-autopilot/\\" target=\\"_blank\\">Part 1</a> of this series, we showed how to gather satellite data in an automated fashion and analyze it in <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html\\" target=\\"_blank\\">Amazon SageMaker Studio</a> with interactive visualization. In this post, we show how to use <a href=\\"https://aws.amazon.com/sagemaker/autopilot/\\" target=\\"_blank\\">Amazon SageMaker Autopilot</a> to automate the process of building a custom mangrove classifier.</p>\\n<h3><a id=\\"Train_a_model_with_Autopilot_2\\"></a><strong>Train a model with Autopilot</strong></h3>\\n<p>Autopilot provides a balanced way of building several models and selecting the best one. While creating multiple combinations of different data preprocessing techniques and ML models with minimal effort, Autopilot provides complete control over these component steps to the data scientist, if desired.</p>\n<p>You can use Autopilot using one of the AWS SDKs (details available in the <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-reference.html\\" target=\\"_blank\\">API reference guide for Autopilot</a>) or through Studio. We use Autopilot in our Studio solution following the steps outlined in this section:</p>\\n<ol>\\n<li>On the Studio Launcher page, choose the plus sign for <strong>New Autopilot experiment</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/90e3c8361e72463f974d6ca2771e1c7b_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"2\\">\\n<li><strong>For Connect</strong> your data, select <strong>Find S3 bucket</strong>, and enter the bucket name where you kept the training and test datasets.</li>\\n<li>For <strong>Dataset file name</strong>, enter the name of the training data file you created in the <strong>Prepare the training data</strong> section in <a href=\\"https://aws.amazon.com/blogs/machine-learning/part-1-identify-mangrove-forests-using-satellite-image-features-using-amazon-sagemaker-studio-and-amazon-sagemaker-autopilot/\\" target=\\"_blank\\">Part 1</a>.</li>\\n<li>For <strong>Output data location (S3 bucket)</strong>, enter the same bucket name you used in step 2.</li>\\n<li>For <strong>Dataset directory name</strong>, enter a folder name under the bucket where you want Autopilot to store artifacts.</li>\\n<li>For <strong>Is your S3 input a manifest file?</strong>, choose <strong>Off</strong>.</li>\\n<li>For <strong>Target</strong>, choose <strong>label</strong>.</li>\\n<li>For <strong>Auto deploy</strong>, choose <strong>Off</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/30f77f9804ba40f083b126ccdb44d2e3_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"9\\">\\n<li>Under the <strong>Advanced settings</strong>, for <strong>Machine learning problem type</strong>, choose** Binary Classification**.</li>\\n<li>For <strong>Objective metric</strong>, choose <strong>AUC</strong>.</li>\\n<li>For <strong>Choose how to run your experiment</strong>, choose <strong>No</strong>, <strong>run a pilot to create a notebook with candidate definitions</strong>.</li>\\n<li>Choose <strong>Create Experiment</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/000324fe10d34a2f91711cb45d27b5c7_image.png\\" alt=\\"image.png\\" /></p>\n<p>For more information about creating an experiment, refer to <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development-create-experiment.html\\" target=\\"_blank\\">Create an Amazon SageMaker Autopilot experiment</a>.It may take about 15 minutes to run this step.</p>\\n<ol start=\\"13\\">\\n<li>When complete, choose <strong>Open candidate generation notebook</strong>, which opens a new notebook in read-only mode.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/d086a6a5d0bc4840807f228200743f9b_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"14\\">\\n<li>Choose I<strong>mport notebook</strong> to make the notebook editable.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b6d0ea01dc234a899c8d4592dde1fa9a_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"15\\">\\n<li>For Image, choose <strong>Data Science</strong>.</li>\\n<li>For <strong>Kernel</strong>, choose <strong>Python 3</strong>.</li>\\n<li>Choose <strong>Select</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/db1bf016ae2e4a0e8ce28cba38bfe2de_image.png\\" alt=\\"image.png\\" /></p>\n<p>This auto-generated notebook has detailed explanations and provides complete control over the actual model building task to follow. A customized version of the <a href=\\"https://github.com/aws-samples/mangrove-landcover-classification/blob/e3f501d99f735ae815552dd168d80ab592d86979/notebooks/mangrove-2013.ipynb\\" target=\\"_blank\\">notebook</a>, where a classifier is trained using Landsat satellite bands from 2013, is available in the code repository under <code>notebooks/mangrove-2013.ipynb</code>.</p>\\n<p>The model building framework consists of two parts: feature transformation as part of the data processing step, and hyperparameter optimization (HPO) as part of the model selection step. All the necessary artifacts for these tasks were created during the Autopilot experiment and saved in <a href=\\"http://aws.amazon.com/s3\\" target=\\"_blank\\">Amazon Simple Storage Service</a> ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)). The first notebook cell downloads those artifacts from [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) to the local <a href=\\"https://aws.amazon.com/sagemaker/\\" target=\\"_blank\\">Amazon SageMaker</a> file system for inspection and any necessary modification. There are two folders: <code>generated_module</code> and <code>sagemaker_automl</code>, where all the Python modules and scripts necessary to run the notebook are stored. The various feature transformation steps like imputation, scaling, and PCA are saved as <code>generated_modules/candidate_data_processors/dpp*.py</code>.</p>\\n<p>Autopilot creates three different models based on the XGBoost, linear learner, and multi-layer perceptron (MLP) algorithms. A candidate pipeline consists of one of the feature transformations options, known as <code>data_transformer</code>, and an algorithm. A pipeline is a Python dictionary and can be defined as follows:</p>\\n<pre><code class=\\"lang-\\">candidate1 = {\\n &quot;data_transformer&quot;: {\\n &quot;name&quot;: &quot;dpp5&quot;,\\n &quot;training_resource_config&quot;: {\\n &quot;instance_type&quot;: &quot;ml.m5.4xlarge&quot;,\\n &quot;instance_count&quot;: 1,\\n &quot;volume_size_in_gb&quot;: 50\\n },\\n &quot;transform_resource_config&quot;: {\\n &quot;instance_type&quot;: &quot;ml.m5.4xlarge&quot;,\\n &quot;instance_count&quot;: 1,\\n },\\n &quot;transforms_label&quot;: True,\\n &quot;transformed_data_format&quot;: &quot;application/x-recordio-protobuf&quot;,\\n &quot;sparse_encoding&quot;: True\\n },\\n &quot;algorithm&quot;: {\\n &quot;name&quot;: &quot;xgboost&quot;,\\n &quot;training_resource_config&quot;: {\\n &quot;instance_type&quot;: &quot;ml.m5.4xlarge&quot;,\\n &quot;instance_count&quot;: 1,\\n },\\n }\\n}\\n</code></pre>\\n<p>In this example, the pipeline transforms the training data according to the script in <code>generated_modules/candidate_data_processors/dpp5.py</code> and builds an XGBoost model. This is where Autopilot provides complete control to the data scientist, who can pick the automatically generated feature transformation and model selection steps or build their own combination.</p>\\n<p>You can now add the pipeline to a pool for Autopilot to run the experiment as follows:</p>\n<pre><code class=\\"lang-\\">from sagemaker_automl import AutoMLInteractiveRunner, AutoMLLocalCandidate\\n\\nautoml_interactive_runner = AutoMLInteractiveRunner(AUTOML_LOCAL_RUN_CONFIG)\\nautoml_interactive_runner.select_candidate(candidate1)\\n</code></pre>\\n<p>This is an important step where you can decide to keep only a subset of candidates suggested by Autopilot, based on subject matter expertise, to reduce the total runtime. For now, keep all Autopilot suggestions, which you can list as follows:</p>\n<pre><code class=\\"lang-\\">automl_interactive_runner.display_candidates()\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/1c703dadb84941d08800455bf49716dd_image.png\\" alt=\\"image.png\\" /></p>\n<p>The full Autopilot experiment is done in two parts. First, you need to run the data transformation jobs:</p>\n<pre><code class=\\"lang-\\">automl_interactive_runner.fit_data_transformers(parallel_jobs=7)\\n</code></pre>\\n<p>This step should complete in about 30 minutes for all the candidates, if you make no further modifications to the <code>dpp*.py</code> files.</p>\\n<p>The next step is to build the best set of models by tuning the hyperparameters for the respective algorithms. The hyperparameters are usually divided into two parts: static and tunable. The static hyperparameters remain unchanged throughout the experiment for all candidates that share the same algorithm. These hyperparameters are passed to the experiment as a dictionary. If you choose to pick the best XGBoost model by maximizing AUC from three rounds of a five-fold cross-validation scheme, the dictionary looks like the following code:</p>\n<pre><code class=\\"lang-\\">{\\n 'objective': 'binary:logistic',\\n 'eval_metric': 'auc',\\n '_kfold': 5,\\n '_num_cv_round': 3,\\n} \\n</code></pre>\\n<p>For the tunable hyperparameters, you need to pass another dictionary with ranges and scaling type:</p>\n<pre><code class=\\"lang-\\">{\\n 'num_round': IntegerParameter(64, 1024, scaling_type='Logarithmic'),\\n 'max_depth': IntegerParameter(2, 8, scaling_type='Logarithmic'),\\n 'eta': ContinuousParameter(1e-3, 1.0, scaling_type='Logarithmic'),\\n... \\n}\\n</code></pre>\\n<p>The complete set of hyperparameters is available in the <code>mangrove-2013.ipynb</code> <a href=\\"https://github.com/aws-samples/mangrove-landcover-classification/blob/e3f501d99f735ae815552dd168d80ab592d86979/notebooks/mangrove-2013.ipynb\\" target=\\"_blank\\">notebook</a>.</p>\\n<p>To create an experiment where all seven candidates can be tested in parallel, create a multi-algorithm HPO tuner:</p>\n<pre><code class=\\"lang-\\">multi_algo_tuning_parameters = automl_interactive_runner.prepare_multi_algo_parameters(\\n objective_metrics=ALGORITHM_OBJECTIVE_METRICS,\\n static_hyperparameters=STATIC_HYPERPARAMETERS,\\n hyperparameters_search_ranges=ALGORITHM_TUNABLE_HYPERPARAMETER_RANGES)\\n</code></pre>\\n<p>The objective metrics are defined independently for each algorithm:</p>\n<pre><code class=\\"lang-\\">ALGORITHM_OBJECTIVE_METRICS = {\\n 'xgboost': 'validation:auc',\\n 'linear-learner': 'validation:roc_auc_score',\\n 'mlp': 'validation:roc_auc',\\n}\\n</code></pre>\\n<p>Trying all possible values of hyperparameters for all the experiments is wasteful; you can adopt a Bayesian strategy to create an HPO tuner:</p>\n<pre><code class=\\"lang-\\">multi_algo_tuning_inputs = automl_interactive_runner.prepare_multi_algo_inputs()\\nase_tuning_job_name = &quot;{}-tuning&quot;.format(AUTOML_LOCAL_RUN_CONFIG.local_automl_job_name)\\n\\ntuner = HyperparameterTuner.create(\\n base_tuning_job_name=base_tuning_job_name,\\n strategy='Bayesian',\\n objective_type='Maximize',\\n max_parallel_jobs=10,\\n max_jobs=50,\\n **multi_algo_tuning_parameters,\\n)\\n</code></pre>\\n<p>In the default setting, Autopilot picks 250 jobs in the tuner to pick the best model. For this use case, it’s sufficient to set <code>max_jobs=50</code> to save time and resources, without any significant penalty in terms of picking the best set of hyperparameters. Finally, submit the HPO job as follows:</p>\\n<pre><code class=\\"lang-\\">tuner.fit(inputs=multi_algo_tuning_inputs, include_cls_metadata=None)\\n</code></pre>\\n<p>The process takes about 80 minutes on ml.m5.4xlarge instances. You can monitor progress on the SageMaker console by choosing <strong>Hyperparameter tuning jobs</strong> under **Training **in the navigation pane.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/0f89d790a8024ee3be53f1cbb355ee3c_image.png\\" alt=\\"image.png\\" /></p>\n<p>You can visualize a host of useful information, including the performance of each candidate, by choosing the name of the job in progress.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b7d1f64c1f8e467682880f0b583007c0_image.png\\" alt=\\"image.png\\" /></p>\n<p>Finally, compare the model performance of the best candidates as follows:</p>\n<pre><code class=\\"lang-\\">from sagemaker.analytics import HyperparameterTuningJobAnalytics\\n\\nSAGEMAKER_SESSION = AUTOML_LOCAL_RUN_CONFIG.sagemaker_session\\nSAGEMAKER_ROLE = AUTOML_LOCAL_RUN_CONFIG.role\\n\\ntuner_analytics = HyperparameterTuningJobAnalytics(\\n tuner.latest_tuning_job.name, sagemaker_session=SAGEMAKER_SESSION)\\n\\ndf_tuning_job_analytics = tuner_analytics.dataframe()\\n\\ndf_tuning_job_analytics.sort_values(\\n by=['FinalObjectiveValue'],\\n inplace=True,\\n ascending=False if tuner.objective_type == &quot;Maximize&quot; else True)\\n\\n# select the columns to display and rename\\nselect_columns = [&quot;TrainingJobDefinitionName&quot;, &quot;FinalObjectiveValue&quot;, &quot;TrainingElapsedTimeSeconds&quot;]\\nrename_columns = {\\n\\t&quot;TrainingJobDefinitionName&quot;: &quot;candidate&quot;,\\n\\t&quot;FinalObjectiveValue&quot;: &quot;AUC&quot;,\\n\\t&quot;TrainingElapsedTimeSeconds&quot;: &quot;run_time&quot; \\n}\\n\\n# Show top 5 model performances\\ndf_tuning_job_analytics.rename(columns=rename_columns)[rename_columns.values()].set_index(&quot;candidate&quot;).head(5)\\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/0143c7b2e31d462fa3e42610f7922ec6_image.png\\" alt=\\"image.png\\" /></p>\n<p>The top performing model based on MLP, while marginally better than the XGBoost models with various choices of data processing steps, also takes a lot longer to train. You can find important details about the MLP model training, including the combination of hyperparameters used, as follows:</p>\n<pre><code class=\\"lang-\\">df_tuning_job_analytics.loc[df_tuning_job_analytics.TrainingJobName==best_training_job].T.dropna() \\n</code></pre>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/56bc98105b22436793b6c03752571fe8_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Create_an_inference_pipeline_218\\"></a><strong>Create an inference pipeline</strong></h3>\\n<p>To generate inference on new data, you have to construct an inference pipeline on SageMaker to host the best model that can be called later to generate inference. The SageMaker pipeline model requires three containers as its components: data transformation, algorithm, and inverse label transformation (if numerical predictions need to be mapped on to non-numerical labels). For brevity, only part of the required code is shown in the following snippet; the complete code is available in the <code>mangrove-2013.ipynb</code> <a href=\\"https://github.com/aws-samples/mangrove-landcover-classification/blob/e3f501d99f735ae815552dd168d80ab592d86979/notebooks/mangrove-2013.ipynb\\" target=\\"_blank\\">notebook</a>:</p>\\n<pre><code class=\\"lang-\\">from sagemaker.estimator import Estimator\\nfrom sagemaker import PipelineModel\\nfrom sagemaker_automl import select_inference_output\\n\\n…\\n# Final pipeline model \\nmodel_containers = [best_data_transformer_model, best_algo_model]\\nif best_candidate.transforms_label:\\n\\tmodel_containers.append(best_candidate.get_data_transformer_model(\\n \\ttransform_mode=&quot;inverse-label-transform&quot;,\\n \\trole=SAGEMAKER_ROLE,\\n \\tsagemaker_session=SAGEMAKER_SESSION))\\n\\n# select the output type\\nmodel_containers = select_inference_output(&quot;BinaryClassification&quot;, model_containers, output_keys=['predicted_label'])\\n</code></pre>\\n<p>After the model containers are built, you can construct and deploy the pipeline as follows:</p>\n<pre><code class=\\"lang-\\">from sagemaker import PipelineModel\\n\\npipeline_model = PipelineModel(\\n\\tname=f&quot;mangrove-automl-2013&quot;,\\n\\trole=SAGEMAKER_ROLE,\\n\\tmodels=model_containers,\\n\\tvpc_config=AUTOML_LOCAL_RUN_CONFIG.vpc_config)\\n\\npipeline_model.deploy(initial_instance_count=1,\\n \\tinstance_type='ml.m5.2xlarge',\\n \\tendpoint_name=pipeline_model.name,\\n \\twait=True)\\n</code></pre>\\n<p>The endpoint deployment takes about 10 minutes to complete.</p>\n<h3><a id=\\"Get_inference_on_the_test_dataset_using_an_endpoint_259\\"></a><strong>Get inference on the test dataset using an endpoint</strong></h3>\\n<p>After the endpoint is deployed, you can invoke it with a payload of features B1–B7 to classify each pixel in an image as either mangrove (1) or other (0):</p>\n<pre><code class=\\"lang-\\">import boto3\\nsm_runtime = boto3.client('runtime.sagemaker')\\n\\npred_labels = []\\nwith open(local_download, 'r') as f:\\n for i, row in enumerate(f):\\n payload = row.rstrip('\\\\n')\\n x = sm_runtime.invoke_endpoint(EndpointName=inf_endpt,\\n \\tContentType=&quot;text/csv&quot;,\\n \\tBody=payload)\\n pred_labels.append(int(x['Body'].read().decode().strip()))\\n</code></pre>\\n<p>Complete details on postprocessing the model predictions for evaluation and plotting are available in <code>notebooks/model_performance.ipynb</code>.</p>\\n<h3><a id=\\"Get_inference_on_the_test_dataset_using_a_batch_transform_279\\"></a><strong>Get inference on the test dataset using a batch transform</strong></h3>\\n<p>Now that you have created the best-performing model with Autopilot, we can use the model for inference. To get inference on large datasets, it’s more efficient to use a batch transform. Let’s generate predictions on the entire dataset (training and test) and append the results to the features, so that we can perform further analysis to, for instance, check the predicted vs. actuals and the distribution of features amongst predicted classes.</p>\n<p>First, we create a manifest file in Amazon S3 that points to the locations of the training and test data from the previous data processing steps:</p>\n<pre><code class=\\"lang-\\">import boto3\\ndata_bucket = &lt;Name of the S3 bucket that has the training data&gt;\\nprefix = &quot;LANDSAT_LC08_C01_T1_SR/Year2013&quot;\\nmanifest = &quot;[{{\\\\&quot;prefix\\\\&quot;: \\\\&quot;s3://{}/{}/\\\\&quot;}},\\\\n\\\\&quot;train.csv\\\\&quot;,\\\\n\\\\&quot;test.csv\\\\&quot;\\\\n]&quot;.format(data_bucket, prefix)\\ns3_client = boto3.client('s3')\\ns3_client.put_object(Body=manifest, Bucket=data_bucket, Key=f&quot;{prefix}/data.manifest&quot;)\\n</code></pre>\\n<p>Now we can create a batch transform job. Because our input train and test dataset have <code>label</code> as the last column, we need to drop it during inference. To do that, we pass <code>InputFilter</code> in the <code>DataProcessing</code> argument. The code <code>&quot;\$[:-2]&quot;</code> indicates to drop the last column. The predicted output is then joined with the source data for further analysis.</p>\\n<p>In the following code, we construct the arguments for the batch transform job and then pass to the <code>create_transform_job</code> function:</p>\\n<pre><code class=\\"lang-\\">from time import gmtime, strftime\\n\\nbatch_job_name = &quot;Batch-Transform-&quot; + strftime(&quot;%Y-%m-%d-%H-%M-%S&quot;, gmtime())\\noutput_location = &quot;s3://{}/{}/batch_output/{}&quot;.format(data_bucket, prefix, batch_job_name)\\ninput_location = &quot;s3://{}/{}/data.manifest&quot;.format(data_bucket, prefix)\\n\\nrequest = {\\n &quot;TransformJobName&quot;: batch_job_name,\\n &quot;ModelName&quot;: pipeline_model.name,\\n &quot;TransformOutput&quot;: {\\n &quot;S3OutputPath&quot;: output_location,\\n &quot;Accept&quot;: &quot;text/csv&quot;,\\n &quot;AssembleWith&quot;: &quot;Line&quot;,\\n },\\n &quot;TransformInput&quot;: {\\n &quot;DataSource&quot;: {&quot;S3DataSource&quot;: {&quot;S3DataType&quot;: &quot;ManifestFile&quot;, &quot;S3Uri&quot;: input_location}},\\n &quot;ContentType&quot;: &quot;text/csv&quot;,\\n &quot;SplitType&quot;: &quot;Line&quot;,\\n &quot;CompressionType&quot;: &quot;None&quot;,\\n },\\n &quot;TransformResources&quot;: {&quot;InstanceType&quot;: &quot;ml.m4.xlarge&quot;, &quot;InstanceCount&quot;: 1},\\n &quot;DataProcessing&quot;: {&quot;InputFilter&quot;: &quot;\$[:-2]&quot;, &quot;JoinSource&quot;: &quot;Input&quot;}\\n}\\n\\nsagemaker = boto3.client(&quot;sagemaker&quot;)\\nsagemaker.create_transform_job(**request)\\nprint(&quot;Created Transform job with name: &quot;, batch_job_name)\\n</code></pre>\\n<p>You can monitor the status of the job on the SageMaker console.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/3efb5132327a45ffb711a55c02e4f300_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Visualize_model_performance_332\\"></a><strong>Visualize model performance</strong></h3>\\n<p>You can now visualize the performance of the best model on the test dataset, consisting of regions from India, Myanmar, Cuba, and Vietnam, as a confusion matrix. The model has a high recall value for pixels representing mangroves, but only about 75% precision. The precision of non-mangrove or other pixels stand at 99% with an 85% recall. You can tune the probability cutoff of the model predictions to adjust the respective values depending on the particular use case.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/3db091bc54cd4c5d80274f0f3be4fec6_image.png\\" alt=\\"image.png\\" /></p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/ef2fe54f1372414ab0540cc2a0949484_image.png\\" alt=\\"image.png\\" /></p>\n<p>It’s worth noting that the results are a significant improvement over the built-in smileCart model.</p>\n<h3><a id=\\"Visualize_model_predictions_342\\"></a><strong>Visualize model predictions</strong></h3>\\n<p>Finally, it’s useful to observe the model performance on specific regions on the map. In the following image, the mangrove area in the India-Bangladesh border is depicted in red. Points sampled from the Landsat image patch belonging to the test dataset are superimposed on the region, where each point is a pixel that the model determines to be representing mangroves. The blue points are classified correctly by the model, whereas the black points represent mistakes by the model.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/febbdda2fb254c75893cbb18b3ff9c01_image.png\\" alt=\\"image.png\\" /></p>\n<p>The following image shows only the points that the model predicted to not represent mangroves, with the same color scheme as the preceding example. The gray outline is the part of the Landsat patch that doesn’t include any mangroves. As is evident from the image, the model doesn’t make any mistake classifying points on water, but faces a challenge when distinguishing pixels representing mangroves from those representing regular foliage.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/eadd27e99492461a841c6f8883252e39_image.png\\" alt=\\"image.png\\" /></p>\n<p>The following image shows model performance on the Myanmar mangrove region.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e873879ab90646758e620ec76753221c_image.png\\" alt=\\"image.png\\" /></p>\n<p>In the following image, the model does a better job identifying mangrove pixels.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/4b571a7b558c49f98c59ebbd7f0aa5e1_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Clean_up_360\\"></a><strong>Clean up</strong></h3>\\n<p>The SageMaker inference endpoint continues to incur cost if left running. Delete the endpoint as follows when you’re done:</p>\n<pre><code class=\\"lang-\\">sagemaker.delete_endpoint(EndpointName=pipeline_model.name)\\n</code></pre>\\n<h3><a id=\\"Conclusion_368\\"></a><strong>Conclusion</strong></h3>\\n<p>This series of posts provided an end-to-end framework for data scientists for solving GIS problems. <a href=\\"https://aws.amazon.com/blogs/machine-learning/part-1-identify-mangrove-forests-using-satellite-image-features-using-amazon-sagemaker-studio-and-amazon-sagemaker-autopilot/\\" target=\\"_blank\\">Part 1</a> showed the ETL process and a convenient way to visually interact with the data. Part 2 showed how to use Autopilot to automate building a custom mangrove classifier.</p>\\n<p>You can use this framework to explore new satellite datasets containing a richer set of bands useful for mangrove classification and explore feature engineering by incorporating domain knowledge.</p>\n<h3><a id=\\"About_the_Authors_374\\"></a><strong>About the Authors</strong></h3>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/fc334ff00f9f490dadcb7e0d1e151882_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Andrei Ivanovic</strong> is an incoming Master’s of Computer Science student at the University of Toronto and a recent graduate of the Engineering Science program at the University of Toronto, majoring in Machine Intelligence with a Robotics/Mechatronics minor. He is interested in computer vision, deep learning, and robotics. He did the work presented in this post during his summer internship at Amazon.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/417517f990584edc9aafbb44c06f3e75_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>David Dong</strong> is a Data Scientist at Amazon Web Services.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/b1f1f318b2364153baaa488156fe4a32_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Arkajyoti Misra</strong> is a Data Scientist at Amazon LastMile Transportation. He is passionate about applying Computer Vision techniques to solve problems that helps the earth. He loves to work with non-profit organizations and is a founding member of <a href=\\"http://ekipi.org/\\" target=\\"_blank\\">ekipi.org</a>.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭