{"value":"Predicting common machine failure types is critical in manufacturing industries. Given a set of characteristics of a product that is tied to a given type of failure, you can develop a model that can predict the failure type when you feed those attributes to a machine learning (ML) model. ML can help with insights, but up until now you needed ML experts to build models to predict machine failure types, the lack of which could delay any corrective actions that businesses need for efficiencies or improvement.\n\nIn this post, we show you how business analysts can build a machine failure type prediction ML model with [Amazon SageMaker Canvas](https://aws.amazon.com/sagemaker/canvas/). SageMaker Canvas provides you with a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.\n\n### **Solution overview**\n\nLet’s assume you’re a business analyst assigned to a maintenance team of a large manufacturing organization. Your maintenance team has asked you to assist in predicting common failures. They have provided you with a historical dataset that contains characteristics tied to a given type of failure and would like you to predict which failure will occur in the future. The failure types include No Failure, Overstrain, and Power Failures. The data schema is listed in the following table.\n\n![image.png](https://dev-media.amazoncloud.cn/c8e004f92a3c4ab4b8c9f01fa9ec6de8_image.png)\n\nAfter the failure type is identified, businesses can take any corrective actions. To do this, you use the data you have in a CSV file, which contains certain characteristics of a product as outlined in the table. You use SageMaker Canvas to perform the following steps:\n\n1. Import the maintenance dataset.\n2. Train and build the predictive machine maintenance model.\n3. Analyze the model results.\n4. Test predictions against the model.\n\n### **Prerequisites**\n\nA cloud admin with an [AWS account](https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-![image.png](https://dev-media.amazoncloud.cn/c69defa6b52841309fdd9caaddd8f13a_image.png)account/) with appropriate permissions is required to complete the following prerequisites:\n\n1. Deploy an [Amazon SageMaker](https://aws.amazon.com/sagemaker/) domain For instructions, see [Onboard to Amazon SageMaker Domain](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html).\n2. Launch SageMaker Canvas. For instructions, see [Setting up and managing Amazon SageMaker Canvas (for IT administrators](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-setting-up.html)).\n3. Configure cross-origin resource sharing (CORS) policies for SageMaker Canvas. For instructions, see [Give your users the ability to upload local files](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-set-up-local-upload.html).\n\n### **Import the dataset**\n\nFirst, download the [maintenance dataset](https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/maintenance_dataset.csv) and review the file to make sure all the data is there.\n\nSageMaker Canvas provides several sample datasets in your application to help you get started. To learn more about the SageMaker-provided sample datasets you can experiment with, see [Use sample datasets](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-sample-datasets.html). If you use the sample dataset (```canvas-sample-maintenance.csv```) available within SageMaker Canvas, you don’t have to import the maintenance dataset.\n\n![image.png](https://dev-media.amazoncloud.cn/62ee71e524924cb4a82af7d084123733_image.png)\n\nYou can import data from different data sources into SageMaker Canvas. If you plan to use your own dataset, follow the steps in [Importing data in Amazon SageMaker Canvas](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-importing-data.html).\n\nFor this post, we use the full maintenance dataset that we downloaded.\n\n1. Sign in to the [AWS Management Console](http://aws.amazon.com/console), using an account with the appropriate permissions to access SageMaker Canvas.\n2. Log in to the SageMaker Canvas console.\n3. Choose **Import**.\n\n![image.png](https://dev-media.amazoncloud.cn/237dfd7f66534dd5a4dadac9dd587b83_image.png)\n\n4. Choose **Upload **and select the ```maintenance_dataset.csv``` file.\n5. Choose **Import data** to upload it to SageMaker\n\n![image.png](https://dev-media.amazoncloud.cn/76294cb1bf7c404b8296809451ea4199_image.png)\n Canvas.\n\nThe import process takes approximately 10 seconds (this can vary depending on dataset size). When it’s complete, you can see the dataset is in ```Ready``` status.\n\nAfter you confirm that the imported dataset is ```ready```, you can create your model.\n\n### **Build and train the model**\n\nTo create and train your model, complete the following steps:\n\n1. Choose **New model**, and provide a name for your model.\n2. Choose **Create**.\n\n![image.png](https://dev-media.amazoncloud.cn/ca83c2c9526f46c5a8bf292b9b871917_image.png)\n\n3. Select the ```maintenance_dataset.csv``` dataset and choose **Select dataset**.\nIn the model view, you can see four tabs, which correspond to the four steps to create a model and use it to generate predictions: **Select**, **Build**, **Analyze**, and **Predict**.\n4. On the Select tab, select the ```maintenance_dataset.csv``` dataset you uploaded previously and choose **Select dataset**.\n\n![image.png](https://dev-media.amazoncloud.cn/e78692bbea0540f49174a2a759368436_image.png)\n\nThis dataset includes 9 columns and 10,000 rows.\n\nSageMaker Canvas automatically moves to the Build phase.\n\n5. On this tab, choose the target column, in our case **Failure Type**.The maintenance team has informed you that this column indicates the type of failures typically seen based off of historical data from their existing machines. This is what you want to train your model to predict. SageMaker Canvas automatically detects that this is a **3 Category** problem (also known as multi-class classification). If the wrong model type is detected, you can change it manually with the Change type option.\n\n![image.png](https://dev-media.amazoncloud.cn/9aedae35aa1d4b149100138de65b8a7c_image.png)\n\nIt should be noted that this dataset is highly unbalanced towards the No Failure class, which can be seen by viewing the column named **Failure Type**. Although SageMaker Canvas and the underlying AutoML capabilities can partly handle dataset imbalance, this may result in some skewed performances. As an additional next step, refer to [Balance your data for machine learning with Amazon SageMaker Data Wrangler](https://aws.amazon.com/blogs/machine-learning/balance-your-data-for-machine-learning-with-amazon-sagemaker-data-wrangler/). Following the steps in the shared link, you can launch an [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) app from the SageMaker console and import this dataset within [Amazon SageMaker Data Wrangler](https://aws.amazon.com/sagemaker/data-wrangler/) and use the Balance data transformation, then take the balanced dataset back to SageMaker Canvas and continue the following steps. We are proceeding with the imbalanced dataset in this post to show that SageMaker Canvas can handle imbalanced datasets as well.\nIn the bottom half of the page, you can look at some of the statistics of the dataset, including missing and mismatched values, unique vales, and mean and median values. You can also drop some of the columns if you don’t want to use them for the prediction by simply deselecting them.\nAfter you’ve explored this section, it’s time to train the model! Before building a complete model, it’s a good practice to have a general idea about the model performance by training a Quick Model. A quick model trains fewer combinations of models and hyperparameters in order to prioritize speed over accuracy, especially in cases where you want to prove the value of training an ML model for your use case. Note that the quick build option isn’t available for models bigger than 50,000 rows.\n\n6. Choose **Quick build**.\n\n![image.png](https://dev-media.amazoncloud.cn/6387dc5c10b1420397c65fa7a789eb30_image.png)\n\nNow you wait anywhere from 2–15 minutes. Once done, SageMaker Canvas automatically moves to the Analyze tab to show you the results of quick training. The analysis performed using quick build estimates that your model is able to predict the right failure type (outcome) 99.2% of the time. You may experience slightly different values. This is expected.\n\nLet’s focus on the first tab, **Overview**. This is the tab that shows you the **Column impact**, or the estimated importance of each column in predicting the target column. In this example, the Torque [Nm] and Rotational speed [rpm] columns have the most significant impact in predicting what type of failure will occur.\n\n![image.png](https://dev-media.amazoncloud.cn/71d9b59bba4e4f2c84507b49e19c330f_image.png)\n\n### **Evaluate model performance**\n\nWhen you move to the Scoring portion of your analysis, you can see a plot representing the distribution of our predicted values with respect to the actual values. Notice that most failures will be within the No Failure category. To learn more about how SageMaker Canvas uses SHAP baselines to bring explainability to ML, refer to [Evaluating Your Model’s Performance in Amazon SageMaker Canvas, as well](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-evaluate-model.html) as [SHAP Baselines for Explainability](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-feature-attribute-shap-baselines.html).\n\n![image.png](https://dev-media.amazoncloud.cn/a746d9e6a2ac4e46bb6e228b91ea3921_image.png)\n\nSageMaker Canvas splits the original dataset into train and validation sets before the training. The scoring is a result of SageMaker Canvas running the validation set against the model. This is an interactive interface where you can select the failure type. If you choose **Overstrain Failure** in the graphic, you can see that model identifies these 84% of time. This is good enough to take action on—perhaps have an operator or engineer check further. You can choose **Power Failure** in the graphic to see the respective scoring for further interpretation and actions.\n\nYou may be interested in failure types and how well the model predicts failure types based on a series of inputs. To take a closer look at the results, choose **Advanced metrics**. This displays a matrix that allows you to more closely examine the results. In ML, this is referred to as a confusion matrix.\n\n![image.png](https://dev-media.amazoncloud.cn/c8d59bedc28c409b93901cdb4dd4973e_image.png)\n\nThis matrix defaults to the dominate class, No Failure. On the **Class **menu, you can choose to view advanced metrics of the other two failure types of Overstrain Failure and Power Failure.\n\nIn ML, the accuracy of the model is defined as the number of correct predictions divided over the total number of predictions. The blue boxes represent correct predictions that the model made against a subset of test data where there was a known outcome. Here we are interested in what percentage of the time the model predicted a particular machine failure type (lets say No Failure) when its actually that failure type (No Failure). In ML, a ratio used to measure this is TP / (TP + FN). This is referred to as recall. In the default case, No Failure, there were 1,923 correct predictions out of 1,926 overall records, which resulted in 99% recall. Alternatively, in the class of Overstrain Failure, there were 32 out of 38, which results in 84% recall. Lastly, in the class of Power Failure, there were 16 out of 19, which results in 84% recall.\n\nNow, you have two options:\n\n1. You can use this model to run some predictions by choosing **Predict**.\n2. You can create a new version of this model to train with the **Standard build** option. This will take much longer—about 1–2 hours—but provides a more robust model because it goes through a full AutoML review of data, algorithms, and tuning iterations.\n\nBecause you’re trying to predict failures, and the model predicts failures correctly 84% of time, you can confidently use the model to identify possible failures. So, you can proceed to option 1. If you weren’t confident, then you could have a data scientist review the modeling SageMaker Canvas did and offer potential improvements via option 2.\n\n### **Generate predictions**\n\nNow that the model is trained, you can start generating predictions.\n\n1. Choose **Predict **at the bottom of the **Analyze** page, or choose the **Predict **tab.\n2. Choose **Select dataset**, and choose the ```maintenance_dataset.csv``` file.\n3. Choose **Generate predictions**.\n\nSageMaker Canvas uses this dataset to generate our predictions. Although it’s generally a good idea to not use the same dataset for both training and testing, you can use the same dataset for the sake of simplicity in this case. Alternatively, you can remove some records from your original dataset that you use for training and use those records in a CSV file and feed it to the batch prediction here so you don’t use the same dataset for testing post-training.\n\n![image.png](https://dev-media.amazoncloud.cn/bc4d379e46d44594b3f3aa7aebeb6669_image.png)\n\nAfter a few seconds, the prediction is complete. SageMaker Canvas returns a prediction for each row of data and the probability of the prediction being correct. You can choose Preview to view the predictions, or choose Download to download a CSV file containing the full output.\n\n![image.png](https://dev-media.amazoncloud.cn/d65479ffd3e94dd1835b43d365c86cc9_image.png)\n\nYou can also choose to predict one by one values by choosing **Single prediction** instead of **Batch prediction**. SageMaker Canvas shows you a view where you can provide the values for each feature manually and generate a prediction. This is ideal for situations like what-if scenarios, for example: How does the tool wear impact the failure type? What if process temperature increases or decreases? What if rotational speed changes?\n\n![image.png](https://dev-media.amazoncloud.cn/214d8c3aed494911a2e6c7ca9277276a_image.png)\n\n### **Standard build**\n\nThe **Standard build** option chooses accuracy over speed. If you want to share the artifacts of the model with your data scientist and ML engineers, you can create a standard build next.\n\n1. Choose **Add version**\n\n![image.png](https://dev-media.amazoncloud.cn/22e2ac0bb6da4d2895e0bb50158900d3_image.png)\n\n2. Choose a new version and choose **Standard build**.\n\n![image.png](https://dev-media.amazoncloud.cn/b9e8f3852ff44d85b3afec53bd9c23af_image.png)\n\n3. After you create a standard build, you can share the model with data scientists and ML engineers for further evaluation and iteration.\n\n![image.png](https://dev-media.amazoncloud.cn/8450571e74fc4c02abf6c60eb2dbff18_image.png)\n\n### **Clean up**\n\nTo avoid incurring future [session charges](https://aws.amazon.com/sagemaker/canvas/pricing), log out of SageMaker Canvas.\n\n![image.png](https://dev-media.amazoncloud.cn/9e5fcd1661ad43899851774f5e6e7f95_image.png)\n\n### **Conclusion**\n\nIn this post, we showed how a business analyst can create a machine failure type prediction model with SageMaker Canvas using maintenance data. SageMaker Canvas allows business analysts such as reliability engineers to create accurate ML models and generate predictions using a no-code, visual, point-and-click interface. Analysts can take this to the next level by sharing their models with data scientist colleagues. Data scientists can view the SageMaker Canvas model in Studio, where they can explore the choices SageMaker Canvas made, validate model results, and even take the model to production with a few clicks. This can accelerate ML-based value creation and help scale improved outcomes faster.\n\nTo learn more about using SageMaker Canvas, see [Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas](https://aws.amazon.com/blogs/machine-learning/build-share-deploy-how-business-analysts-and-data-scientists-achieve-faster-time-to-market-using-no-code-ml-and-amazon-sagemaker-canvas/). For more information about creating ML models with a no-code solution, see [Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts](https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/).\n\n### **About the Authors**\n\n![image.png](https://dev-media.amazoncloud.cn/2c4da404a0f8438daadd8b5b42b5cef5_image.png)\n\n**Rajakumar Sampathkumar** is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.\n\n![image.png](https://dev-media.amazoncloud.cn/60876a8d2d3d43bdac47027d0004922f_image.png)\n\n**Twann Atkins** is a Senior Solutions Architect for Amazon Web Services. He is responsible for working with Agriculture, Retail, and Manufacturing customers to identify business problems and working backwards to identify viable and scalable technical solutions. Twann has been helping customers plan and migrate critical workloads for more than 10 years with a recent focus on democratizing analytics, artificial intelligence and machine learning for customers and builders of tomorrow.\n\n![image.png](https://dev-media.amazoncloud.cn/e8b7837796ea4ecba6c5a126d91b4f87_image.png)\n\n**Omkar Mukadam** is a Edge Specialist Solution Architecture at Amazon Web Services. He currently focuses on solutions which enables commercial customers to effectively design, build and scale with AWS Edge service offerings which includes but not limited to AWS Snow Family.","render":"<p>Predicting common machine failure types is critical in manufacturing industries. Given a set of characteristics of a product that is tied to a given type of failure, you can develop a model that can predict the failure type when you feed those attributes to a machine learning (ML) model. ML can help with insights, but up until now you needed ML experts to build models to predict machine failure types, the lack of which could delay any corrective actions that businesses need for efficiencies or improvement.</p>\n<p>In this post, we show you how business analysts can build a machine failure type prediction ML model with <a href=\\"https://aws.amazon.com/sagemaker/canvas/\\" target=\\"_blank\\">Amazon SageMaker Canvas</a>. SageMaker Canvas provides you with a visual point-and-click interface that allows you to build models and generate accurate ML predictions on your own—without requiring any ML experience or having to write a single line of code.</p>\\n<h3><a id=\\"Solution_overview_4\\"></a><strong>Solution overview</strong></h3>\\n<p>Let’s assume you’re a business analyst assigned to a maintenance team of a large manufacturing organization. Your maintenance team has asked you to assist in predicting common failures. They have provided you with a historical dataset that contains characteristics tied to a given type of failure and would like you to predict which failure will occur in the future. The failure types include No Failure, Overstrain, and Power Failures. The data schema is listed in the following table.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/c8e004f92a3c4ab4b8c9f01fa9ec6de8_image.png\\" alt=\\"image.png\\" /></p>\n<p>After the failure type is identified, businesses can take any corrective actions. To do this, you use the data you have in a CSV file, which contains certain characteristics of a product as outlined in the table. You use SageMaker Canvas to perform the following steps:</p>\n<ol>\\n<li>Import the maintenance dataset.</li>\n<li>Train and build the predictive machine maintenance model.</li>\n<li>Analyze the model results.</li>\n<li>Test predictions against the model.</li>\n</ol>\\n<h3><a id=\\"Prerequisites_17\\"></a><strong>Prerequisites</strong></h3>\\n<p>A cloud admin with an <a href=\\"https://aws.amazon.com/premiumsupport/knowledge-center/create-and-activate-aws-!%5Bimage.png%5D(https://dev-media.amazoncloud.cn/c69defa6b52841309fdd9caaddd8f13a_image.png)account/\\" target=\\"_blank\\">AWS account</a> with appropriate permissions is required to complete the following prerequisites:</p>\\n<ol>\\n<li>Deploy an <a href=\\"https://aws.amazon.com/sagemaker/\\" target=\\"_blank\\">Amazon SageMaker</a> domain For instructions, see <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/gs-studio-onboard.html\\" target=\\"_blank\\">Onboard to Amazon SageMaker Domain</a>.</li>\\n<li>Launch SageMaker Canvas. For instructions, see <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-setting-up.html\\" target=\\"_blank\\">Setting up and managing Amazon SageMaker Canvas (for IT administrators</a>).</li>\\n<li>Configure cross-origin resource sharing (CORS) policies for SageMaker Canvas. For instructions, see <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-set-up-local-upload.html\\" target=\\"_blank\\">Give your users the ability to upload local files</a>.</li>\\n</ol>\n<h3><a id=\\"Import_the_dataset_25\\"></a><strong>Import the dataset</strong></h3>\\n<p>First, download the <a href=\\"https://static.us-east-1.prod.workshops.aws/public/6f2f7cb1-bfda-4b34-ae39-928502784393/static/datasets/maintenance_dataset.csv\\" target=\\"_blank\\">maintenance dataset</a> and review the file to make sure all the data is there.</p>\\n<p>SageMaker Canvas provides several sample datasets in your application to help you get started. To learn more about the SageMaker-provided sample datasets you can experiment with, see <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-sample-datasets.html\\" target=\\"_blank\\">Use sample datasets</a>. If you use the sample dataset (<code>canvas-sample-maintenance.csv</code>) available within SageMaker Canvas, you don’t have to import the maintenance dataset.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/62ee71e524924cb4a82af7d084123733_image.png\\" alt=\\"image.png\\" /></p>\n<p>You can import data from different data sources into SageMaker Canvas. If you plan to use your own dataset, follow the steps in <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-importing-data.html\\" target=\\"_blank\\">Importing data in Amazon SageMaker Canvas</a>.</p>\\n<p>For this post, we use the full maintenance dataset that we downloaded.</p>\n<ol>\\n<li>Sign in to the <a href=\\"http://aws.amazon.com/console\\" target=\\"_blank\\">AWS Management Console</a>, using an account with the appropriate permissions to access SageMaker Canvas.</li>\\n<li>Log in to the SageMaker Canvas console.</li>\n<li>Choose <strong>Import</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/237dfd7f66534dd5a4dadac9dd587b83_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"4\\">\\n<li>Choose **Upload **and select the <code>maintenance_dataset.csv</code> file.</li>\\n<li>Choose <strong>Import data</strong> to upload it to SageMaker</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/76294cb1bf7c404b8296809451ea4199_image.png\\" alt=\\"image.png\\" /><br />\\nCanvas.</p>\n<p>The import process takes approximately 10 seconds (this can vary depending on dataset size). When it’s complete, you can see the dataset is in <code>Ready</code> status.</p>\\n<p>After you confirm that the imported dataset is <code>ready</code>, you can create your model.</p>\\n<h3><a id=\\"Build_and_train_the_model_53\\"></a><strong>Build and train the model</strong></h3>\\n<p>To create and train your model, complete the following steps:</p>\n<ol>\\n<li>Choose <strong>New model</strong>, and provide a name for your model.</li>\\n<li>Choose <strong>Create</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/ca83c2c9526f46c5a8bf292b9b871917_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"3\\">\\n<li>Select the <code>maintenance_dataset.csv</code> dataset and choose <strong>Select dataset</strong>.<br />\\nIn the model view, you can see four tabs, which correspond to the four steps to create a model and use it to generate predictions: <strong>Select</strong>, <strong>Build</strong>, <strong>Analyze</strong>, and <strong>Predict</strong>.</li>\\n<li>On the Select tab, select the <code>maintenance_dataset.csv</code> dataset you uploaded previously and choose <strong>Select dataset</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e78692bbea0540f49174a2a759368436_image.png\\" alt=\\"image.png\\" /></p>\n<p>This dataset includes 9 columns and 10,000 rows.</p>\n<p>SageMaker Canvas automatically moves to the Build phase.</p>\n<ol start=\\"5\\">\\n<li>On this tab, choose the target column, in our case <strong>Failure Type</strong>.The maintenance team has informed you that this column indicates the type of failures typically seen based off of historical data from their existing machines. This is what you want to train your model to predict. SageMaker Canvas automatically detects that this is a <strong>3 Category</strong> problem (also known as multi-class classification). If the wrong model type is detected, you can change it manually with the Change type option.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/9aedae35aa1d4b149100138de65b8a7c_image.png\\" alt=\\"image.png\\" /></p>\n<p>It should be noted that this dataset is highly unbalanced towards the No Failure class, which can be seen by viewing the column named <strong>Failure Type</strong>. Although SageMaker Canvas and the underlying AutoML capabilities can partly handle dataset imbalance, this may result in some skewed performances. As an additional next step, refer to <a href=\\"https://aws.amazon.com/blogs/machine-learning/balance-your-data-for-machine-learning-with-amazon-sagemaker-data-wrangler/\\" target=\\"_blank\\">Balance your data for machine learning with Amazon SageMaker Data Wrangler</a>. Following the steps in the shared link, you can launch an <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html\\" target=\\"_blank\\">Amazon SageMaker Studio</a> app from the SageMaker console and import this dataset within <a href=\\"https://aws.amazon.com/sagemaker/data-wrangler/\\" target=\\"_blank\\">Amazon SageMaker Data Wrangler</a> and use the Balance data transformation, then take the balanced dataset back to SageMaker Canvas and continue the following steps. We are proceeding with the imbalanced dataset in this post to show that SageMaker Canvas can handle imbalanced datasets as well.<br />\\nIn the bottom half of the page, you can look at some of the statistics of the dataset, including missing and mismatched values, unique vales, and mean and median values. You can also drop some of the columns if you don’t want to use them for the prediction by simply deselecting them.<br />\\nAfter you’ve explored this section, it’s time to train the model! Before building a complete model, it’s a good practice to have a general idea about the model performance by training a Quick Model. A quick model trains fewer combinations of models and hyperparameters in order to prioritize speed over accuracy, especially in cases where you want to prove the value of training an ML model for your use case. Note that the quick build option isn’t available for models bigger than 50,000 rows.</p>\n<ol start=\\"6\\">\\n<li>Choose <strong>Quick build</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/6387dc5c10b1420397c65fa7a789eb30_image.png\\" alt=\\"image.png\\" /></p>\n<p>Now you wait anywhere from 2–15 minutes. Once done, SageMaker Canvas automatically moves to the Analyze tab to show you the results of quick training. The analysis performed using quick build estimates that your model is able to predict the right failure type (outcome) 99.2% of the time. You may experience slightly different values. This is expected.</p>\n<p>Let’s focus on the first tab, <strong>Overview</strong>. This is the tab that shows you the <strong>Column impact</strong>, or the estimated importance of each column in predicting the target column. In this example, the Torque [Nm] and Rotational speed [rpm] columns have the most significant impact in predicting what type of failure will occur.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/71d9b59bba4e4f2c84507b49e19c330f_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Evaluate_model_performance_90\\"></a><strong>Evaluate model performance</strong></h3>\\n<p>When you move to the Scoring portion of your analysis, you can see a plot representing the distribution of our predicted values with respect to the actual values. Notice that most failures will be within the No Failure category. To learn more about how SageMaker Canvas uses SHAP baselines to bring explainability to ML, refer to <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-evaluate-model.html\\" target=\\"_blank\\">Evaluating Your Model’s Performance in Amazon SageMaker Canvas, as well</a> as <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-feature-attribute-shap-baselines.html\\" target=\\"_blank\\">SHAP Baselines for Explainability</a>.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/a746d9e6a2ac4e46bb6e228b91ea3921_image.png\\" alt=\\"image.png\\" /></p>\n<p>SageMaker Canvas splits the original dataset into train and validation sets before the training. The scoring is a result of SageMaker Canvas running the validation set against the model. This is an interactive interface where you can select the failure type. If you choose <strong>Overstrain Failure</strong> in the graphic, you can see that model identifies these 84% of time. This is good enough to take action on—perhaps have an operator or engineer check further. You can choose <strong>Power Failure</strong> in the graphic to see the respective scoring for further interpretation and actions.</p>\\n<p>You may be interested in failure types and how well the model predicts failure types based on a series of inputs. To take a closer look at the results, choose <strong>Advanced metrics</strong>. This displays a matrix that allows you to more closely examine the results. In ML, this is referred to as a confusion matrix.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/c8d59bedc28c409b93901cdb4dd4973e_image.png\\" alt=\\"image.png\\" /></p>\n<p>This matrix defaults to the dominate class, No Failure. On the **Class **menu, you can choose to view advanced metrics of the other two failure types of Overstrain Failure and Power Failure.</p>\n<p>In ML, the accuracy of the model is defined as the number of correct predictions divided over the total number of predictions. The blue boxes represent correct predictions that the model made against a subset of test data where there was a known outcome. Here we are interested in what percentage of the time the model predicted a particular machine failure type (lets say No Failure) when its actually that failure type (No Failure). In ML, a ratio used to measure this is TP / (TP + FN). This is referred to as recall. In the default case, No Failure, there were 1,923 correct predictions out of 1,926 overall records, which resulted in 99% recall. Alternatively, in the class of Overstrain Failure, there were 32 out of 38, which results in 84% recall. Lastly, in the class of Power Failure, there were 16 out of 19, which results in 84% recall.</p>\n<p>Now, you have two options:</p>\n<ol>\\n<li>You can use this model to run some predictions by choosing <strong>Predict</strong>.</li>\\n<li>You can create a new version of this model to train with the <strong>Standard build</strong> option. This will take much longer—about 1–2 hours—but provides a more robust model because it goes through a full AutoML review of data, algorithms, and tuning iterations.</li>\\n</ol>\n<p>Because you’re trying to predict failures, and the model predicts failures correctly 84% of time, you can confidently use the model to identify possible failures. So, you can proceed to option 1. If you weren’t confident, then you could have a data scientist review the modeling SageMaker Canvas did and offer potential improvements via option 2.</p>\n<h3><a id=\\"Generate_predictions_113\\"></a><strong>Generate predictions</strong></h3>\\n<p>Now that the model is trained, you can start generating predictions.</p>\n<ol>\\n<li>Choose **Predict **at the bottom of the <strong>Analyze</strong> page, or choose the **Predict **tab.</li>\\n<li>Choose <strong>Select dataset</strong>, and choose the <code>maintenance_dataset.csv</code> file.</li>\\n<li>Choose <strong>Generate predictions</strong>.</li>\\n</ol>\n<p>SageMaker Canvas uses this dataset to generate our predictions. Although it’s generally a good idea to not use the same dataset for both training and testing, you can use the same dataset for the sake of simplicity in this case. Alternatively, you can remove some records from your original dataset that you use for training and use those records in a CSV file and feed it to the batch prediction here so you don’t use the same dataset for testing post-training.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/bc4d379e46d44594b3f3aa7aebeb6669_image.png\\" alt=\\"image.png\\" /></p>\n<p>After a few seconds, the prediction is complete. SageMaker Canvas returns a prediction for each row of data and the probability of the prediction being correct. You can choose Preview to view the predictions, or choose Download to download a CSV file containing the full output.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/d65479ffd3e94dd1835b43d365c86cc9_image.png\\" alt=\\"image.png\\" /></p>\n<p>You can also choose to predict one by one values by choosing <strong>Single prediction</strong> instead of <strong>Batch prediction</strong>. SageMaker Canvas shows you a view where you can provide the values for each feature manually and generate a prediction. This is ideal for situations like what-if scenarios, for example: How does the tool wear impact the failure type? What if process temperature increases or decreases? What if rotational speed changes?</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/214d8c3aed494911a2e6c7ca9277276a_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Standard_build_133\\"></a><strong>Standard build</strong></h3>\\n<p>The <strong>Standard build</strong> option chooses accuracy over speed. If you want to share the artifacts of the model with your data scientist and ML engineers, you can create a standard build next.</p>\\n<ol>\\n<li>Choose <strong>Add version</strong></li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/22e2ac0bb6da4d2895e0bb50158900d3_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"2\\">\\n<li>Choose a new version and choose <strong>Standard build</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b9e8f3852ff44d85b3afec53bd9c23af_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"3\\">\\n<li>After you create a standard build, you can share the model with data scientists and ML engineers for further evaluation and iteration.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/8450571e74fc4c02abf6c60eb2dbff18_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Clean_up_149\\"></a><strong>Clean up</strong></h3>\\n<p>To avoid incurring future <a href=\\"https://aws.amazon.com/sagemaker/canvas/pricing\\" target=\\"_blank\\">session charges</a>, log out of SageMaker Canvas.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/9e5fcd1661ad43899851774f5e6e7f95_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Conclusion_155\\"></a><strong>Conclusion</strong></h3>\\n<p>In this post, we showed how a business analyst can create a machine failure type prediction model with SageMaker Canvas using maintenance data. SageMaker Canvas allows business analysts such as reliability engineers to create accurate ML models and generate predictions using a no-code, visual, point-and-click interface. Analysts can take this to the next level by sharing their models with data scientist colleagues. Data scientists can view the SageMaker Canvas model in Studio, where they can explore the choices SageMaker Canvas made, validate model results, and even take the model to production with a few clicks. This can accelerate ML-based value creation and help scale improved outcomes faster.</p>\n<p>To learn more about using SageMaker Canvas, see <a href=\\"https://aws.amazon.com/blogs/machine-learning/build-share-deploy-how-business-analysts-and-data-scientists-achieve-faster-time-to-market-using-no-code-ml-and-amazon-sagemaker-canvas/\\" target=\\"_blank\\">Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas</a>. For more information about creating ML models with a no-code solution, see <a href=\\"https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/\\" target=\\"_blank\\">Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts</a>.</p>\\n<h3><a id=\\"About_the_Authors_161\\"></a><strong>About the Authors</strong></h3>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/2c4da404a0f8438daadd8b5b42b5cef5_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Rajakumar Sampathkumar</strong> is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/60876a8d2d3d43bdac47027d0004922f_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Twann Atkins</strong> is a Senior Solutions Architect for Amazon Web Services. He is responsible for working with Agriculture, Retail, and Manufacturing customers to identify business problems and working backwards to identify viable and scalable technical solutions. Twann has been helping customers plan and migrate critical workloads for more than 10 years with a recent focus on democratizing analytics, artificial intelligence and machine learning for customers and builders of tomorrow.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/e8b7837796ea4ecba6c5a126d91b4f87_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Omkar Mukadam</strong> is a Edge Specialist Solution Architecture at Amazon Web Services. He currently focuses on solutions which enables commercial customers to effectively design, build and scale with AWS Edge service offerings which includes but not limited to AWS Snow Family.</p>\n"}