Predict shipment ETA with no-code machine learning using Amazon SageMaker Canvas

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Logistics and transportation companies track ETA (estimated time of arrival), which is a key metric for their business. Their downstream supply chain activities are planned based on this metric. However, delays often occur, and the ETA might differ from the product’s or shipment’s actual time of arrival (ATA), for instance due to shipping distance or carrier-related or weather-related issues. This impacts the entire supply chain, in many instances reducing productivity and increasing waste and inefficiencies. Predicting the exact day a product arrives to a customer is challenging because it depends on various factors such as order type, carrier, origin, and distance.\n\nAnalysts working in the logistics and transportation industry have domain expertise and knowledge of shipping and logistics attributes. However, they need to be able to generate accurate shipment ETA forecasts for efficient business operations. They need an intuitive, easy-to-use, no-code capability to create machine learning (ML) models for predicting shipping ETA forecasts.\n\nTo help achieve the agility and effectiveness that business analysts seek, we launched [Amazon SageMaker Canvas](https://aws.amazon.com/sagemaker/canvas/), a no-code ML solution that helps companies accelerate solutions to business problems quickly and easily. SageMaker Canvas provides business analysts with a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.\n\nIn this post, we show how to use SageMaker Canvas to predict shipment ETAs.\n\n### **Solution overview**\n\nAlthough ML development is a complex and iterative process, we can generalize an ML workflow into business requirements analysis, data preparation, model development, and model deployment stages.\n\n![image.png](https://dev-media.amazoncloud.cn/5121cb993f0e45c59082e9d57d414e8e_image.png)\n\nSageMaker Canvas abstracts the complexities of data preparation and model development, so you can focus on delivering value to your business by drawing insights from your data without a deep knowledge of the data science domain. The following architecture diagram highlights the components in a no-code or low-code solution.\n\n![image.png](https://dev-media.amazoncloud.cn/d09aae960a5145ba8418cdbf1b33f16d_image.png)\n\nThe following are the steps as outlined in the architecture:\n\n1. Download the dataset to your local machine.\n2. Import the data into SageMaker Canvas.\n3. Join your datasets.\n4. Prepare the data.\n5. Build and train your model.\n6. Evaluate the model.\n7. Test the model.\n8. Share the model for deployment.\n\nLet’s assume you’re a business analyst assigned to the product shipment tracking team of a large logistics and transportation organization. Your shipment tracking team has asked you to assist in predicting the shipment ETA. They have provided you with a historical dataset that contains characteristics tied to different products and their respective ETA, and want you to predict the ETA for products that will be shipped in the future.\n\nWe use SageMaker Canvas to perform the following steps:\n\n1. Import our sample datasets.\n2. Join the datasets.\n3. Train and build the predictive machine maintenance model.\n4. Analyze the model results.\n5. Test predictions aga![image.png](https://dev-media.amazoncloud.cn/383666bcfe1b4bd38132bbf8f3a17b7a_image.png)inst the model.\n\n### **Dataset overview**\n\nWe use two datasets ([shipping logs](https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ShippingLogs.csv) and [product description](https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ProductDescriptions.csv)) in CSV format, which contain shipping log information and certain characteristics of a product, respectively.\n\nThe ```ShippingLogs``` dataset contains the complete shipping data for all products delivered, including estimated time shipping priority, carrier, and origin. It has approximately 10,000 rows and 12 feature columns. The following table summarizes the data schema.\n\n![image.png](https://dev-media.amazoncloud.cn/6439fb73c5274c0797246bc645ba95b4_image.png)\n\nThe ```ProductDescription``` dataset contains metadata information of the product that is being shipped in the order. This dataset has approximately 10,000 rows and 5 feature columns. The following table summarizes the data schema.\n\n![image.png](https://dev-media.amazoncloud.cn/2f0e1da4c7a349468bba3e55552458a6_image.png)\n\n### **Prerequisites**\n\nAn IT administrator with an [AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/) with appropriate permissions must complete the following prerequisites:\n\n1. Deploy an [Amazon SageMaker](https://aws.amazon.com/sagemaker/) domain. For instructions, see [Onboard to Amazon SageMaker Domain](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html).\n2. Launch SageMaker Canvas. For instructions, [see Setting up and managing Amazon SageMaker Canvas (for IT administrators)](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-setting-up.html).\n3. Configure cross-origin resource sharing (CORS) policies in [Amazon Simple Storage Service](http://aws.amazon.com/s3) (Amazon S3) for SageMaker Canvas to enable the upload option from local disk. For instructions, see [Give your users the ability to upload local files](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-set-up-local-upload.html).\n\n### **Import the dataset**\n\nFirst, download the datasets ([shipping logs](https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ShippingLogs.csv) and [product description](https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ProductDescriptions.csv)) and review the files to make sure all the data is there.\n\nSageMaker Canvas provides several sample datasets in your application to help you get started. To learn more about the SageMaker-provided sample datasets you can experiment with, see [Use sample datasets](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-sample-datasets.html). If you use the sample datasets (```canvas-sample-shipping-logs.csv``` and ```canvas-sample-product-descriptions.csv```) available within SageMaker Canvas, you don’t have to import the shipping logs and product description datasets.\n\nYou can import data from different data sources into SageMaker Canvas. If you plan to use your own dataset, follow the steps in [Importing data in Amazon SageMaker Canvas](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-importing-data.html).\n\nFor this post, we use the full shipping logs and product description datasets that we downloaded.\n\n1. Sign in to the [AWS Management Console](http://aws.amazon.com/console), using an account with the appropriate permissions to access SageMaker Canvas.\n2. On the SageMaker Canvas console, choose **Import**.\n\n![image.png](https://dev-media.amazoncloud.cn/27def05438964d0e8979f902754429f6_image.png)\n\n3. Choose **Upload **and select the files ```ShippingLogs.csv``` and ```ProductDescriptions.csv```.\n4. Choose **Import data** to upload the files to SageMaker Canvas.\n\n![image.png](https://dev-media.amazoncloud.cn/ee13812ceaf640fb9b6243ef319f6462_image.png)\n\n### **Create a consolidated dataset**\n\nNext, let’s join the two datasets.\n\n1. Choose **Join data**.\n\n![image.png](https://dev-media.amazoncloud.cn/41ef59c8668b470c953a1a8dafb70206_image.png)\n\n2. Drag and drop ```ShippingLogs.csv``` and ```ProductDescriptions.csv``` from the left pane under **Datasets **to the right pane.\n\n![image.png](https://dev-media.amazoncloud.cn/106c9170fa9a4bad95ceea20c7896942_image.png)\n\nThe two datasets are joined using ```ProductID``` as the inner join reference.\n\n3. Choose **Import **and enter a name for the new joined dataset.\n4. Choose **Import data**.\n\n![image.png](https://dev-media.amazoncloud.cn/aa93848a138b4213aadd86a079888e03_image.png)\n\nYou can choose the new dataset to preview its contents.\n\n![image.png](https://dev-media.amazoncloud.cn/57405a9b76274e49bc75f3a7a334e09d_image.png)\n\nAfter you review the dataset, you can create your model.\n\n### **Build and train model**\n\nTo build and train your model, complete the following steps:\n\n1. For **Model **name, enter ```ShippingForecast```.\n2. Choose **Create**.\n\n![image.png](https://dev-media.amazoncloud.cn/ce792c9d0de6409bb5ede52620e17e49_image.png)\n\nIn the Model view, you can see four tabs, which correspond to the four steps to create a model and use it to generate predictions: **Select**, **Build**, **Analyze**, and **Predict**.\n\n3. On the Select tab, select the ```ConsolidatedShippingData``` you created earlier.You can see that this dataset comes from Amazon S3, has 12 columns, and 10,000 rows.\n4. Choose **Select dataset**.\n\n![image.png](https://dev-media.amazoncloud.cn/cd11da98e1644bbda2027512757b5f6b_image.png)\n\nSageMaker Canvas automatically moves to the **Build **tab.\n\n5. On the **Build **tab, choose the target column, in our case ```ActualShippingDays```.\nBecause we’re interested in how many days it will take for the goods to arrive for the customer, SageMaker Canvas automatically detects that this is a numeric prediction problem (also known as regression). Regression estimates the values of a dependent target variable based on one or more other variables or attributes that are correlated with it.Because we also have a column with time series data (```OrderDate```), SageMaker Canvas may interpret this as a time series forecast model type.\n6. Before advancing, make sure that the model type is indeed **Numeric model type**; if that’s not the case, you can select it with the **Change type** option.\n\n![image.png](https://dev-media.amazoncloud.cn/89429495b27f4d2ea449b64b6fc8af5d_image.png)\n\n#### **Data preparation**\nIn the bottom half of the page, you can look at some of the statistics of the dataset, including missing and mismatched values, unique values, and mean and median values.\n\nColumn view provides you with the listing of all columns, their data types, and their basic statistics, including missing and mismatched values, unique values, and mean and median values. This can help you devise a strategy to handle missing values in the datasets.\n\n![image.png](https://dev-media.amazoncloud.cn/f8da2e1b60f54d25a78b4ab95fcafb47_image.png)\n\nGrid view provides you with a graphical distribution of values for each column and the sample data. You can start inferring relevant columns for the training the model.\n\n![image.png](https://dev-media.amazoncloud.cn/0ee90ca191ff4caaacd9bfbdf4e369ab_image.png)\n\nLet’s preview the model to see the estimated RMSE (root mean squared error) for this numeric prediction.\n\n![image.png](https://dev-media.amazoncloud.cn/9c51fb9233dc4df3bca539a172acb7b9_image.png)\n\nYou can also drop some of the columns, if you don’t want to use them for the prediction, by simply deselecting them. For this post, we deselect the ```order*_**id*``` column. Because it’s a primary key, it doesn’t have valuable information, and so doesn’t add value to the model training process.\n\nYou can choose Preview model to get insights on feature importance and iterate the model quickly. We also see the RMSE is now 1.223, which is improved from 1.225. The lower the RMSE, the better a given model is able to fit a dataset.\n\n![image.png](https://dev-media.amazoncloud.cn/e8a5cccff6d440cd9328f72eb33c6833_image.png)\n\nFrom our exploratory data analysis, we can see that the dataset doesn’t have a lot of missing values. Therefore, we don’t have to handle missing values. If you see a lot of missing values for your features, you can filter the missing values.\n\nTo extract more insights, you can proceed with a datetime extraction. With the datetime extraction transform, you can extract values from a datetime column to a separate column.\n\nTo perform a datetime extraction, complete the following steps:\n\n1. On the **Build **tab of the SageMaker Canvas application, choose **Extract**.\n2. Choose the column from which you want to extract values (for this post, ```OrderDate```).\n3. For **Value**, choose one or more values to extract from the column. For this post, we choose **Year **and Month.The values you can extract from a timestamp column are **Year**, **Month**, **Day**, **Hour**, **Week of year**, **Day of year**, and **Quarter**.\n4. Choose **Add** to add the transform to the model \n\n![image.png](https://dev-media.amazoncloud.cn/48baeeb9208f454e89aed0fe9205efa1_image.png)\nrecipe.\n\nSageMaker Canvas creates a new column in the dataset for each of the values you extract.\n\n![image.png](https://dev-media.amazoncloud.cn/5fca75e922f84d1c9d62b6c51cba5bde_image.png)\n\n#### **Model training**\n\nIt’s time to finally train the model! Before building a complete model, it’s a good practice to have a general idea about the performances that our model will have by training a quick model. A quick model trains fewer combinations of models and hyperparameters in order to prioritize speed over accuracy. This is helpful in cases like ours where we want to prove the value of training an ML model for our use case. Note that the quick build option isn’t available for models bigger than 50,000 rows.\n\n![image.png](https://dev-media.amazoncloud.cn/8e809982fb714a008586abcaca8f3d0c_image.png)\n\nNow we wait anywhere from 2–15 minutes for the quick build to finish training our model.\n\n![image.png](https://dev-media.amazoncloud.cn/6ee6daafbf614c689198e8c5215718ee_image.png)\n\n### **Evaluate model performance**\n\nWhen training is complete, SageMaker Canvas automatically moves to the **Analyze **tab to show us the results of our quick training, as shown in the following screenshot.\n\nYou may experience slightly different values. This is expected. Machine learning introduces some variation in the process of training models, which can lead to different results for different builds.\n\n![image.png](https://dev-media.amazoncloud.cn/0290540b819b4569903752a038956084_image.png)\n\nLet’s focus on the **Overview** tab. This tab shows you the column impact, or the estimated importance of each column in predicting the target column. In this example, the ```ExpectedShippingDays``` column has the most significant impact in our predictions.\n\nOn the **Scoring** tab, you can see a plot representing the best fit regression line for ```ActualshippingDays```. On average, the model prediction has a difference of +/- 0.7 from the actual value of ```ActualShippingDays```. The Scoring section for numeric prediction shows a line to indicate the model’s predicted value in relation to the data used to make predictions. The values of the numeric prediction are often +/- the RMSE value. The value that the model predicts is often within the range of the RMSE. The width of the purple band around the line indicates the RMSE range. The predicted values often fall within the range.\n\n![image.png](https://dev-media.amazoncloud.cn/fa3cd1bf0d22444f93ce93a7d545fbb9_image.png)\n\nAs the thickness of the RMSE band on a model increases, the accuracy of the prediction decreases. As you can see, the model predicts with high accuracy to begin with (lean band) and as the value of ```actualshippingdays``` increases (17–22), the band becomes thicker, indicating lower accuracy.\n\nThe **Advanced metrics** section contains information for users that want a deeper understanding of their model performance. The metrics for numeric prediction are as follows:\n\n- **R2** – The percentage of the difference in the target column that can be explained by the input column.\n- **MAE** – Mean absolute error. On average, the prediction for the target column is +/- {MAE} from the actual value.\n- **MAPE** – Mean absolute percent error. On average, the prediction for the target column is +/- {MAPE} % from the actual value.\n- **RMSE** – Root mean square error. The standard deviation of the errors.\n\nThe following screenshot shows a graph of the residuals or errors. The horizontal line indicates an error of 0 or a perfect prediction. The blue dots are the errors. Their distance from the horizontal line indicates the magnitude of the errors.\n\n![image.png](https://dev-media.amazoncloud.cn/d5c194b3231e401981f5512c769114b2_image.png)\n\n**R-squared** is a statistical measure of how close the data is to the fitted regression line. The higher percentage indicates that the model explains all the variability of the response data around its mean 87% of the time.\n\nOn average, the prediction for the target column is +/- 0.709 {MAE} from the actual value. This indicates that on average the model will predict the target within half a day. This is useful for planning purposes.\n\nThe model has a standard deviation (RMSE) of 1.223. As you can see, the model predicts with high accuracy to begin with (lean band) and as the value of ```actualshippingdays``` increases (17–22), the band becomes thicker, indicating lower accuracy.\n\nThe following image shows an error density plot.\n\n![image.png](https://dev-media.amazoncloud.cn/b21e8a71027c4d009a533229d26b4133_image.png)\n\nYou now have two options as next steps:\n\n- You can use this model to run some predictions by choosing **Predict**.\n- You can create a new version of this model to train with the **Standard build** option. This will take much longer—about 4–6 hours—but will produce more accurate results.\n\nBecause we feel confident about using this model given the performances we’ve seen, we opt to go ahead and use the model for predictions. If you weren’t confident, you could have a data scientist review the modeling SageMaker Canvas did and offer potential improvements.\n\nNote that training a model with the **Standard build** option is necessary to share the model with a data scientist with the [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) integration\n\n### **Generate predictions**\n\nNow that the model is trained, let’s generate some predictions.\n\n1. Choose **Predict **on the **Analyze **tab, or choose the **Predict** tab.\n2. Choose **Batch prediction**.\n3. Choose **Select dataset**, and choose the dataset ```ConsolidatedShipping.csv```.\n\nSageMaker Canvas uses this dataset to generate our predictions. Although it’s generally not a good idea not to use the same dataset for both training and testing, we’re using the same dataset for the sake of simplicity. You can also import another dataset if you desire.\n\n![image.png](https://dev-media.amazoncloud.cn/c45c3722e95644af9c3ded304e5e02ae_image.png)\n\nAfter a few seconds, the prediction is done and you can choose the eye icon to see a preview of the predictions, or choose **Download **to download a CSV file containing the full output.\n\nYou can also choose to predict values one by one by selecting **Single prediction** instead of **Batch prediction**. SageMaker Canvas then shows you a view where you can provide the values for each feature manually and generate a prediction. This is ideal for situations like what-if scenarios—for example, how does ```ActualShippingDays``` change if the ```ShippingOrigin``` is Houston? What if we used a different carrier? What if the ```PackageWeight``` is different?\n\n![image.png](https://dev-media.amazoncloud.cn/f82f15b39d4649a9b97ffc413ea4fb28_image.png)\n\n### **Standard build**\n\nStandard build chooses accuracy over speed. If you want to share the artifacts of the model with your data scientist and ML engineers, you may choose to create a standard build next.\n\nFirst add a new version.\n\n![image.png](https://dev-media.amazoncloud.cn/3a96452e7fa14f1aaf4ce639116dba34_image.png)\n\nThen choose **Standard build**.\n\n![image.png](https://dev-media.amazoncloud.cn/44533c0dba4546beb818da0fdc3b2c5b_image.png)\n\nThe **Analyze** tab shows your build progress.\n\n![image.png](https://dev-media.amazoncloud.cn/6148018dbe9a458b95433af6eccef4d1_image.png)\n\nWhen the model is complete, you can observe that the RMSE value of the standard build is 1.147, compared to 1.223 with the quick build.\n\n![image.png](https://dev-media.amazoncloud.cn/dc6a593d79fd43e39cbfd4d53a55949f_image.png)\n\nAfter you create a standard build, you can share the model with data scientists and ML engineers for further evaluation and iteration.\n\n![image.png](https://dev-media.amazoncloud.cn/0b625fef28a14e42bc3b084335c5c77a_image.png)\n\n### **Clean up**\n\nTo avoid incurring future [session charges](https://aws.amazon.com/sagemaker/canvas/pricing), log out of SageMaker Canvas.\n\n![image.png](https://dev-media.amazoncloud.cn/14999636e7c7456786f3fd3ab205e930_image.png)\n\n### **Conclusion**\n\nIn this post, we showed how a business analyst can create a shipment ETA prediction model with SageMaker Canvas using sample data. SageMaker Canvas allows you to create accurate ML models and generate predictions using a no-code, visual, point-and-click interface. Analysts can take this to the next level by sharing their models with data scientist colleagues. The data scientists can view the SageMaker Canvas model in Studio, where they can explore the choices SageMaker Canvas made to generate ML models, validate model results, and even take the model to production with a few clicks. This can accelerate ML-based value creation and help scale improved outcomes faster.\n\n- To learn more about using SageMaker Canvas, see [Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas](https://aws.amazon.com/blogs/machine-learning/build-share-deploy-how-business-analysts-and-data-scientists-achieve-faster-time-to-market-using-no-code-ml-and-amazon-sagemaker-canvas/).\n- For more information about creating ML models with a no-code solution, see [Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts](https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/).\n\n### **About the authors**\n\n![image.png](https://dev-media.amazoncloud.cn/3c131c0657ed458b8b153f75cf116693_image.png)\n\n**Rajakumar Sampathkumar** is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about the cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.\n\n![image.png](https://dev-media.amazoncloud.cn/81f48b84a2ea405982b670e161e1d596_image.png)\n\n**Meenakshisundaram Thandavarayan** is a Senior AI/ML specialist with a passion to design, create and promote human-centered Data and Analytics experiences. He supports AWS Strategic customers on their transformation towards data driven organization.","render":"Logistics and transportation companies track ETA (estimated time of arrival), which is a key metric for their business. Their downstream supply chain activities are planned based on this metric. However, delays often occur, and the ETA might differ from the product’s or shipment’s actual time of arrival (ATA), for instance due to shipping distance or carrier-related or weather-related issues. This impacts the entire supply chain, in many instances reducing productivity and increasing waste and inefficiencies. Predicting the exact day a product arrives to a customer is challenging because it depends on various factors such as order type, carrier, origin, and distance.\nAnalysts working in the logistics and transportation industry have domain expertise and knowledge of shipping and logistics attributes. However, they need to be able to generate accurate shipment ETA forecasts for efficient business operations. They need an intuitive, easy-to-use, no-code capability to create machine learning (ML) models for predicting shipping ETA forecasts.\nTo help achieve the agility and effectiveness that business analysts seek, we launched <a href=\"https://aws.amazon.com/sagemaker/canvas/\" target=\"_blank\">Amazon SageMaker Canvas</a>, a no-code ML solution that helps companies accelerate solutions to business problems quickly and easily. SageMaker Canvas provides business analysts with a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.\nIn this post, we show how to use SageMaker Canvas to predict shipment ETAs.\n<h3><a id=\"Solution_overview_8\"></a>Solution overview</h3>\nAlthough ML development is a complex and iterative process, we can generalize an ML workflow into business requirements analysis, data preparation, model development, and model deployment stages.\n<img src=\"https://dev-media.amazoncloud.cn/5121cb993f0e45c59082e9d57d414e8e_image.png\" alt=\"image.png\" />\nSageMaker Canvas abstracts the complexities of data preparation and model development, so you can focus on delivering value to your business by drawing insights from your data without a deep knowledge of the data science domain. The following architecture diagram highlights the components in a no-code or low-code solution.\n<img src=\"https://dev-media.amazoncloud.cn/d09aae960a5145ba8418cdbf1b33f16d_image.png\" alt=\"image.png\" />\nThe following are the steps as outlined in the architecture:\n<ol>\n<li>Download the dataset to your local machine.</li>\n<li>Import the data into SageMaker Canvas.</li>\n<li>Join your datasets.</li>\n<li>Prepare the data.</li>\n<li>Build and train your model.</li>\n<li>Evaluate the model.</li>\n<li>Test the model.</li>\n<li>Share the model for deployment.</li>\n</ol>\nLet’s assume you’re a business analyst assigned to the product shipment tracking team of a large logistics and transportation organization. Your shipment tracking team has asked you to assist in predicting the shipment ETA. They have provided you with a historical dataset that contains characteristics tied to different products and their respective ETA, and want you to predict the ETA for products that will be shipped in the future.\nWe use SageMaker Canvas to perform the following steps:\n<ol>\n<li>Import our sample datasets.</li>\n<li>Join the datasets.</li>\n<li>Train and build the predictive machine maintenance model.</li>\n<li>Analyze the model results.</li>\n<li>Test predictions aga<img src=\"https://dev-media.amazoncloud.cn/383666bcfe1b4bd38132bbf8f3a17b7a_image.png\" alt=\"image.png\" />inst the model.</li>\n</ol>\n<h3><a id=\"Dataset_overview_39\"></a>Dataset overview</h3>\nWe use two datasets (<a href=\"https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ShippingLogs.csv\" target=\"_blank\">shipping logs</a> and <a href=\"https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ProductDescriptions.csv\" target=\"_blank\">product description</a>) in CSV format, which contain shipping log information and certain characteristics of a product, respectively.\nThe <code>ShippingLogs</code> dataset contains the complete shipping data for all products delivered, including estimated time shipping priority, carrier, and origin. It has approximately 10,000 rows and 12 feature columns. The following table summarizes the data schema.\n<img src=\"https://dev-media.amazoncloud.cn/6439fb73c5274c0797246bc645ba95b4_image.png\" alt=\"image.png\" />\nThe <code>ProductDescription</code> dataset contains metadata information of the product that is being shipped in the order. This dataset has approximately 10,000 rows and 5 feature columns. The following table summarizes the data schema.\n<img src=\"https://dev-media.amazoncloud.cn/2f0e1da4c7a349468bba3e55552458a6_image.png\" alt=\"image.png\" />\n<h3><a id=\"Prerequisites_51\"></a>Prerequisites</h3>\nAn IT administrator with an <a href=\"https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-account/\" target=\"_blank\">AWS account</a> with appropriate permissions must complete the following prerequisites:\n<ol>\n<li>Deploy an <a href=\"https://aws.amazon.com/sagemaker/\" target=\"_blank\">Amazon SageMaker</a> domain. For instructions, see <a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html\" target=\"_blank\">Onboard to Amazon SageMaker Domain</a>.</li>\n<li>Launch SageMaker Canvas. For instructions, <a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-setting-up.html\" target=\"_blank\">see Setting up and managing Amazon SageMaker Canvas (for IT administrators)</a>.</li>\n<li>Configure cross-origin resource sharing (CORS) policies in <a href=\"http://aws.amazon.com/s3\" target=\"_blank\">Amazon Simple Storage Service</a> (Amazon S3) for SageMaker Canvas to enable the upload option from local disk. For instructions, see <a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-set-up-local-upload.html\" target=\"_blank\">Give your users the ability to upload local files</a>.</li>\n</ol>\n<h3><a id=\"Import_the_dataset_59\"></a>Import the dataset</h3>\nFirst, download the datasets (<a href=\"https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ShippingLogs.csv\" target=\"_blank\">shipping logs</a> and <a href=\"https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/ProductDescriptions.csv\" target=\"_blank\">product description</a>) and review the files to make sure all the data is there.\nSageMaker Canvas provides several sample datasets in your application to help you get started. To learn more about the SageMaker-provided sample datasets you can experiment with, see <a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-sample-datasets.html\" target=\"_blank\">Use sample datasets</a>. If you use the sample datasets (<code>canvas-sample-shipping-logs.csv</code> and <code>canvas-sample-product-descriptions.csv</code>) available within SageMaker Canvas, you don’t have to import the shipping logs and product description datasets.\nYou can import data from different data sources into SageMaker Canvas. If you plan to use your own dataset, follow the steps in <a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-importing-data.html\" target=\"_blank\">Importing data in Amazon SageMaker Canvas</a>.\nFor this post, we use the full shipping logs and product description datasets that we downloaded.\n<ol>\n<li>Sign in to the <a href=\"http://aws.amazon.com/console\" target=\"_blank\">AWS Management Console</a>, using an account with the appropriate permissions to access SageMaker Canvas.</li>\n<li>On the SageMaker Canvas console, choose Import.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/27def05438964d0e8979f902754429f6_image.png\" alt=\"image.png\" />\n<ol start=\"3\">\n<li>Choose **Upload **and select the files <code>ShippingLogs.csv</code> and <code>ProductDescriptions.csv</code>.</li>\n<li>Choose Import data to upload the files to SageMaker Canvas.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/ee13812ceaf640fb9b6243ef319f6462_image.png\" alt=\"image.png\" />\n<h3><a id=\"Create_a_consolidated_dataset_79\"></a>Create a consolidated dataset</h3>\nNext, let’s join the two datasets.\n<ol>\n<li>Choose Join data.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/41ef59c8668b470c953a1a8dafb70206_image.png\" alt=\"image.png\" />\n<ol start=\"2\">\n<li>Drag and drop <code>ShippingLogs.csv</code> and <code>ProductDescriptions.csv</code> from the left pane under **Datasets **to the right pane.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/106c9170fa9a4bad95ceea20c7896942_image.png\" alt=\"image.png\" />\nThe two datasets are joined using <code>ProductID</code> as the inner join reference.\n<ol start=\"3\">\n<li>Choose **Import **and enter a name for the new joined dataset.</li>\n<li>Choose Import data.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/aa93848a138b4213aadd86a079888e03_image.png\" alt=\"image.png\" />\nYou can choose the new dataset to preview its contents.\n<img src=\"https://dev-media.amazoncloud.cn/57405a9b76274e49bc75f3a7a334e09d_image.png\" alt=\"image.png\" />\nAfter you review the dataset, you can create your model.\n<h3><a id=\"Build_and_train_model_104\"></a>Build and train model</h3>\nTo build and train your model, complete the following steps:\n<ol>\n<li>For **Model **name, enter <code>ShippingForecast</code>.</li>\n<li>Choose Create.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/ce792c9d0de6409bb5ede52620e17e49_image.png\" alt=\"image.png\" />\nIn the Model view, you can see four tabs, which correspond to the four steps to create a model and use it to generate predictions: Select, Build, Analyze, and Predict.\n<ol start=\"3\">\n<li>On the Select tab, select the <code>ConsolidatedShippingData</code> you created earlier.You can see that this dataset comes from Amazon S3, has 12 columns, and 10,000 rows.</li>\n<li>Choose Select dataset.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/cd11da98e1644bbda2027512757b5f6b_image.png\" alt=\"image.png\" />\nSageMaker Canvas automatically moves to the **Build **tab.\n<ol start=\"5\">\n<li>On the **Build **tab, choose the target column, in our case <code>ActualShippingDays</code>. \nBecause we’re interested in how many days it will take for the goods to arrive for the customer, SageMaker Canvas automatically detects that this is a numeric prediction problem (also known as regression). Regression estimates the values of a dependent target variable based on one or more other variables or attributes that are correlated with it.Because we also have a column with time series data (<code>OrderDate</code>), SageMaker Canvas may interpret this as a time series forecast model type.</li>\n<li>Before advancing, make sure that the model type is indeed Numeric model type; if that’s not the case, you can select it with the Change type option.</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/89429495b27f4d2ea449b64b6fc8af5d_image.png\" alt=\"image.png\" />\n<h4><a id=\"Data_preparation_128\"></a>Data preparation</h4>\nIn the bottom half of the page, you can look at some of the statistics of the dataset, including missing and mismatched values, unique values, and mean and median values.\nColumn view provides you with the listing of all columns, their data types, and their basic statistics, including missing and mismatched values, unique values, and mean and median values. This can help you devise a strategy to handle missing values in the datasets.\n<img src=\"https://dev-media.amazoncloud.cn/f8da2e1b60f54d25a78b4ab95fcafb47_image.png\" alt=\"image.png\" />\nGrid view provides you with a graphical distribution of values for each column and the sample data. You can start inferring relevant columns for the training the model.\n<img src=\"https://dev-media.amazoncloud.cn/0ee90ca191ff4caaacd9bfbdf4e369ab_image.png\" alt=\"image.png\" />\nLet’s preview the model to see the estimated RMSE (root mean squared error) for this numeric prediction.\n<img src=\"https://dev-media.amazoncloud.cn/9c51fb9233dc4df3bca539a172acb7b9_image.png\" alt=\"image.png\" />\nYou can also drop some of the columns, if you don’t want to use them for the prediction, by simply deselecting them. For this post, we deselect the <code>order*_**id*</code> column. Because it’s a primary key, it doesn’t have valuable information, and so doesn’t add value to the model training process.\nYou can choose Preview model to get insights on feature importance and iterate the model quickly. We also see the RMSE is now 1.223, which is improved from 1.225. The lower the RMSE, the better a given model is able to fit a dataset.\n<img src=\"https://dev-media.amazoncloud.cn/e8a5cccff6d440cd9328f72eb33c6833_image.png\" alt=\"image.png\" />\nFrom our exploratory data analysis, we can see that the dataset doesn’t have a lot of missing values. Therefore, we don’t have to handle missing values. If you see a lot of missing values for your features, you can filter the missing values.\nTo extract more insights, you can proceed with a datetime extraction. With the datetime extraction transform, you can extract values from a datetime column to a separate column.\nTo perform a datetime extraction, complete the following steps:\n<ol>\n<li>On the **Build **tab of the SageMaker Canvas application, choose Extract.</li>\n<li>Choose the column from which you want to extract values (for this post, <code>OrderDate</code>).</li>\n<li>For Value, choose one or more values to extract from the column. For this post, we choose **Year **and Month.The values you can extract from a timestamp column are Year, Month, Day, Hour, Week of year, Day of year, and Quarter.</li>\n<li>Choose Add to add the transform to the model</li>\n</ol>\n<img src=\"https://dev-media.amazoncloud.cn/48baeeb9208f454e89aed0fe9205efa1_image.png\" alt=\"image.png\" /> \nrecipe.\nSageMaker Canvas creates a new column in the dataset for each of the values you extract.\n<img src=\"https://dev-media.amazoncloud.cn/5fca75e922f84d1c9d62b6c51cba5bde_image.png\" alt=\"image.png\" />\n<h4><a id=\"Model_training_167\"></a>Model training</h4>\nIt’s time to finally train the model! Before building a complete model, it’s a good practice to have a general idea about the performances that our model will have by training a quick model. A quick model trains fewer combinations of models and hyperparameters in order to prioritize speed over accuracy. This is helpful in cases like ours where we want to prove the value of training an ML model for our use case. Note that the quick build option isn’t available for models bigger than 50,000 rows.\n<img src=\"https://dev-media.amazoncloud.cn/8e809982fb714a008586abcaca8f3d0c_image.png\" alt=\"image.png\" />\nNow we wait anywhere from 2–15 minutes for the quick build to finish training our model.\n<img src=\"https://dev-media.amazoncloud.cn/6ee6daafbf614c689198e8c5215718ee_image.png\" alt=\"image.png\" />\n<h3><a id=\"Evaluate_model_performance_177\"></a>Evaluate model performance</h3>\nWhen training is complete, SageMaker Canvas automatically moves to the **Analyze **tab to show us the results of our quick training, as shown in the following screenshot.\nYou may experience slightly different values. This is expected. Machine learning introduces some variation in the process of training models, which can lead to different results for different builds.\n<img src=\"https://dev-media.amazoncloud.cn/0290540b819b4569903752a038956084_image.png\" alt=\"image.png\" />\nLet’s focus on the Overview tab. This tab shows you the column impact, or the estimated importance of each column in predicting the target column. In this example, the <code>ExpectedShippingDays</code> column has the most significant impact in our predictions.\nOn the Scoring tab, you can see a plot representing the best fit regression line for <code>ActualshippingDays</code>. On average, the model prediction has a difference of +/- 0.7 from the actual value of <code>ActualShippingDays</code>. The Scoring section for numeric prediction shows a line to indicate the model’s predicted value in relation to the data used to make predictions. The values of the numeric prediction are often +/- the RMSE value. The value that the model predicts is often within the range of the RMSE. The width of the purple band around the line indicates the RMSE range. The predicted values often fall within the range.\n<img src=\"https://dev-media.amazoncloud.cn/fa3cd1bf0d22444f93ce93a7d545fbb9_image.png\" alt=\"image.png\" />\nAs the thickness of the RMSE band on a model increases, the accuracy of the prediction decreases. As you can see, the model predicts with high accuracy to begin with (lean band) and as the value of <code>actualshippingdays</code> increases (17–22), the band becomes thicker, indicating lower accuracy.\nThe Advanced metrics section contains information for users that want a deeper understanding of their model performance. The metrics for numeric prediction are as follows:\n<ul>\n<li>R2 – The percentage of the difference in the target column that can be explained by the input column.</li>\n<li>MAE – Mean absolute error. On average, the prediction for the target column is +/- {MAE} from the actual value.</li>\n<li>MAPE – Mean absolute percent error. On average, the prediction for the target column is +/- {MAPE} % from the actual value.</li>\n<li>RMSE – Root mean square error. The standard deviation of the errors.</li>\n</ul>\nThe following screenshot shows a graph of the residuals or errors. The horizontal line indicates an error of 0 or a perfect prediction. The blue dots are the errors. Their distance from the horizontal line indicates the magnitude of the errors.\n<img src=\"https://dev-media.amazoncloud.cn/d5c194b3231e401981f5512c769114b2_image.png\" alt=\"image.png\" />\nR-squared is a statistical measure of how close the data is to the fitted regression line. The higher percentage indicates that the model explains all the variability of the response data around its mean 87% of the time.\nOn average, the prediction for the target column is +/- 0.709 {MAE} from the actual value. This indicates that on average the model will predict the target within half a day. This is useful for planning purposes.\nThe model has a standard deviation (RMSE) of 1.223. As you can see, the model predicts with high accuracy to begin with (lean band) and as the value of <code>actualshippingdays</code> increases (17–22), the band becomes thicker, indicating lower accuracy.\nThe following image shows an error density plot.\n<img src=\"https://dev-media.amazoncloud.cn/b21e8a71027c4d009a533229d26b4133_image.png\" alt=\"image.png\" />\nYou now have two options as next steps:\n<ul>\n<li>You can use this model to run some predictions by choosing Predict.</li>\n<li>You can create a new version of this model to train with the Standard build option. This will take much longer—about 4–6 hours—but will produce more accurate results.</li>\n</ul>\nBecause we feel confident about using this model given the performances we’ve seen, we opt to go ahead and use the model for predictions. If you weren’t confident, you could have a data scientist review the modeling SageMaker Canvas did and offer potential improvements.\nNote that training a model with the Standard build option is necessary to share the model with a data scientist with the <a href=\"https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html\" target=\"_blank\">Amazon SageMaker Studio</a> integration\n<h3><a id=\"Generate_predictions_223\"></a>Generate predictions</h3>\nNow that the model is trained, let’s generate some predictions.\n<ol>\n<li>Choose **Predict **on the **Analyze **tab, or choose the Predict tab.</li>\n<li>Choose Batch prediction.</li>\n<li>Choose Select dataset, and choose the dataset <code>ConsolidatedShipping.csv</code>.</li>\n</ol>\nSageMaker Canvas uses this dataset to generate our predictions. Although it’s generally not a good idea not to use the same dataset for both training and testing, we’re using the same dataset for the sake of simplicity. You can also import another dataset if you desire.\n<img src=\"https://dev-media.amazoncloud.cn/c45c3722e95644af9c3ded304e5e02ae_image.png\" alt=\"image.png\" />\nAfter a few seconds, the prediction is done and you can choose the eye icon to see a preview of the predictions, or choose **Download **to download a CSV file containing the full output.\nYou can also choose to predict values one by one by selecting Single prediction instead of Batch prediction. SageMaker Canvas then shows you a view where you can provide the values for each feature manually and generate a prediction. This is ideal for situations like what-if scenarios—for example, how does <code>ActualShippingDays</code> change if the <code>ShippingOrigin</code> is Houston? What if we used a different carrier? What if the <code>PackageWeight</code> is different?\n<img src=\"https://dev-media.amazoncloud.cn/f82f15b39d4649a9b97ffc413ea4fb28_image.png\" alt=\"image.png\" />\n<h3><a id=\"Standard_build_241\"></a>Standard build</h3>\nStandard build chooses accuracy over speed. If you want to share the artifacts of the model with your data scientist and ML engineers, you may choose to create a standard build next.\nFirst add a new version.\n<img src=\"https://dev-media.amazoncloud.cn/3a96452e7fa14f1aaf4ce639116dba34_image.png\" alt=\"image.png\" />\nThen choose Standard build.\n<img src=\"https://dev-media.amazoncloud.cn/44533c0dba4546beb818da0fdc3b2c5b_image.png\" alt=\"image.png\" />\nThe Analyze tab shows your build progress.\n<img src=\"https://dev-media.amazoncloud.cn/6148018dbe9a458b95433af6eccef4d1_image.png\" alt=\"image.png\" />\nWhen the model is complete, you can observe that the RMSE value of the standard build is 1.147, compared to 1.223 with the quick build.\n<img src=\"https://dev-media.amazoncloud.cn/dc6a593d79fd43e39cbfd4d53a55949f_image.png\" alt=\"image.png\" />\nAfter you create a standard build, you can share the model with data scientists and ML engineers for further evaluation and iteration.\n<img src=\"https://dev-media.amazoncloud.cn/0b625fef28a14e42bc3b084335c5c77a_image.png\" alt=\"image.png\" />\n<h3><a id=\"Clean_up_265\"></a>Clean up</h3>\nTo avoid incurring future <a href=\"https://aws.amazon.com/sagemaker/canvas/pricing\" target=\"_blank\">session charges</a>, log out of SageMaker Canvas.\n<img src=\"https://dev-media.amazoncloud.cn/14999636e7c7456786f3fd3ab205e930_image.png\" alt=\"image.png\" />\n<h3><a id=\"Conclusion_271\"></a>Conclusion</h3>\nIn this post, we showed how a business analyst can create a shipment ETA prediction model with SageMaker Canvas using sample data. SageMaker Canvas allows you to create accurate ML models and generate predictions using a no-code, visual, point-and-click interface. Analysts can take this to the next level by sharing their models with data scientist colleagues. The data scientists can view the SageMaker Canvas model in Studio, where they can explore the choices SageMaker Canvas made to generate ML models, validate model results, and even take the model to production with a few clicks. This can accelerate ML-based value creation and help scale improved outcomes faster.\n<ul>\n<li>To learn more about using SageMaker Canvas, see <a href=\"https://aws.amazon.com/blogs/machine-learning/build-share-deploy-how-business-analysts-and-data-scientists-achieve-faster-time-to-market-using-no-code-ml-and-amazon-sagemaker-canvas/\" target=\"_blank\">Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas</a>.</li>\n<li>For more information about creating ML models with a no-code solution, see <a href=\"https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/\" target=\"_blank\">Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts</a>.</li>\n</ul>\n<h3><a id=\"About_the_authors_278\"></a>About the authors</h3>\n<img src=\"https://dev-media.amazoncloud.cn/3c131c0657ed458b8b153f75cf116693_image.png\" alt=\"image.png\" />\nRajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about the cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.\n<img src=\"https://dev-media.amazoncloud.cn/81f48b84a2ea405982b670e161e1d596_image.png\" alt=\"image.png\" />\nMeenakshisundaram Thandavarayan is a Senior AI/ML specialist with a passion to design, create and promote human-centered Data and Analytics experiences. He supports AWS Strategic customers on their transformation towards data driven organization.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家