Accelerate machine learning with Amazon Data Exchange and Amazon Redshift ML

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"[Amazon Redshift ML](https://aws.amazon.com/redshift/features/redshift-ml/) makes it easy for SQL users to create, train, and deploy ML models using familiar SQL commands. Redshift ML allows you to use your data in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) with [Amazon SageMaker](https://aws.amazon.com/sagemaker/), a fully managed ML service, without requiring you to become an expert in ML.\n[AWS Data Exchange](https://aws.amazon.com/data-exchange/) makes it easy to find, subscribe to, and use third-party data in the cloud.[AWS Data Exchange for Amazon Redshift](https://aws.amazon.com/redshift/features/aws-data-exchange-for-amazon-redshift/) enables you to access and query tables in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) without extracting, transforming, and loading files (ETL).\nAs a subscriber, you can browse through the [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) catalog and find data products that are relevant to your business with data stored in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), and [subscribe](https://aws.amazon.com/blogs/aws/new-aws-data-exchange-for-amazon-redshift/) to the data from the providers without any further processing, and no need for an ETL process.\n\nIf the provider data is not already available in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), many providers will add the data to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) upon request.\n\nIn this post, we show you the process of subscribing to datasets through [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) without ETL, running ML algorithms on an [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster, and performing local inference and production.\n\n\n#### **Solution overview**\n\nThe use case for the solution in this post is to predict ticket sales for worldwide events based on historical ticket sales data using a regression model. The data or ETL engineer can build the data pipeline by subscribing to the [Worldwide Event Attendance ](https://aws.amazon.com/marketplace/pp/prodview-4ozlpl4r3k7cg?sr=0-3&ref_=beagle&applicationId=AWSMPContessa)product on [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) without ETL. You can then create the ML model in Redshift ML using the time series ticket sales data and predict future ticket sales.\n\n1. To implement this solution, you complete the following high-level steps:\n2. Subscribe to datasets using [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) for [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail).\n3. Connect to the datashare in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail).\n4. Create the ML model using the SQL notebook feature of the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) query editor V2.\n\nThe following diagram illustrates the solution architecture.\n\n![image.png](https://dev-media.amazoncloud.cn/81b6764d7bf3412db64709a89a48effd_image.png)\n\n#### **Prerequisites**\n\n\nBefore starting this walkthrough, you must complete the following prerequisites:\n1. Make sure you have an existing [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster with RA3 node type. If not, you can [create a provisioned Amazon Redshift cluster.](https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-launch-sample-cluster.html)\n2. Make sure the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster is encrypted, because the data provider is encrypted and [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) data sharing requires homogeneous encryption configurations. For more details on homogeneous encryption, refer to [Data sharing considerations in Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/considerations.html).\n3. Create an AWS Identity Access and Management (IAM) role with access to SageMaker and [Amazon Simple Storage Service](http://aws.amazon.com/s3) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)) and attach it to the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster.Refer to [ Cluster setup](https://docs.aws.amazon.com/redshift/latest/dg/admin-setup.html#cluster-setup) for using [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) ML for more details.\n4. Create an S3 bucket for storing the training data and model output.\n\nPlease note that some of the above AWS resources in this walkthrough will incur charges. Please remember to delete the resources when you’re finished.\n\n\n#### **Subscribe to an [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) product with [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) data**\n\n\nTo subscribe to an [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) public dataset, complete the following steps:\n\n1. On the [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) console, choose Explore available data products.\n2. In the navigation pane, under Data available through, select [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) to filter products with [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) data.\n3. Choose Worldwide Event Attendance (Test Product).\n\n![image.png](https://dev-media.amazoncloud.cn/935b7363db784c11ac5997df63935d46_image.png)\n\n\n4. Choose Continue to subscribe.\n5. Confirm the catalog is subscribed by checking that it’s listed on the Subscriptions page.\n\n![image.png](https://dev-media.amazoncloud.cn/2457b60b1a7d41ac9340a775a05d33b9_image.png)\n\n\n#### **Predict tickets sold using Redshift ML**\n\nTo set up prediction using Redshift ML, complete the following steps:\n\n1. On the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) console, choose Datashares in the navigation pane.\n2. On the Subscriptions tab, confirm that the [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) datashare is available.\n\n![image.png](https://dev-media.amazoncloud.cn/1993c79f3c964435b674878683359e3f_image.png)\n\n![image.png](https://dev-media.amazoncloud.cn/ccbcffee4e3f4d3aabb5a487428ef310_image.png)\n\n3. In the navigation pane, choose Query editor v2.\n4. Connect to your [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster in the navigation pane.\n\n[Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) provides a feature to create notebooks and run your queries; the notebook feature is currently in preview and available in all Regions. For the remaining part of this tutorial, we run queries in the notebook and create comments in the markdown cell for each step.\n\n5.Choose the plus sign and choose Notebook.\n\n![image.png](https://dev-media.amazoncloud.cn/0af2f11cea024edebefdac01a9923689_image.png)\n\n6.Choose Add markdown.\n\n![image.png](https://dev-media.amazoncloud.cn/99c81aee3bd84736ae6b4899bc67e4de_image.png)\n\n7.Enter Show available data share in the cell.\n\n8.Choose Add SQL.\n\n![image.png](https://dev-media.amazoncloud.cn/a69c3f369a0d45ae84e71e89040b2405_image.png)\n\n9. Enter and run the following command to see the available datashares for the cluster:\n\nSQL\n```\\nSHOW datashares; \\n```\nYou should be able to see worldwide_event_test_data, as shown in the following screenshot.\n\n10. Note the producer_namespace and producer_account values from the output, which we use in the next step.\n\n![image.png](https://dev-media.amazoncloud.cn/c9b4295d333c48a2a4d179116528c2fa_image.png)\n\n\n11. Choose Add markdown and enter\n\n```\\nCreate database from datashare with producer_namespace and procedure_account.\\n```\n\n12. Choose Add SQL and enter the following code to [create a database](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_DATABASE.html) to access the datashare. Use the producer_namespace and producer_account values you copied earlier.\n\nSQL\n```\\n\\n\\nCREATE DATABASE ml_blog_db FROM DATASHARE worldwide_event_test_data OF ACCOUNT 'producer_account' NAMESPACE 'producer_namespace';\\n```\n\n\n13. Choose Add markdown and enter Create new table to consolidate features.\n\n14. Choose Add SQL and enter the following code to create a new table called event consisting of the event sales by date and assign a running serial number to split into training and validation datasets:\n\nSQL\n```\\n\\nCREATE TABLE event AS\\n\\tSELECT eventname, qtysold, saletime, day, week, month, qtr, year, holiday, ROW_NUMBER() OVER (ORDER BY RANDOM()) r\\n\\tFROM \\"ml_blog_db\\".\\"public\\".\\"sales\\" s\\n\\tINNER JOIN \\"ml_blog_db\\".\\"public\\".\\"event\\" e\\n\\tON s.eventid = e.eventid\\n\\tINNER JOIN \\"ml_blog_db\\".\\"public\\".\\"date\\" d\\n\\tON s.dateid = d.dateid;\\n\\n```\n\nEvent name, quantity sold, sale time, day, week, month, quarter, year, and holiday are columns in the dataset from [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) that are used as features in the ML model creation.\n\n15. Choose Add markdown and enter Split the dataset into training dataset and validation dataset.\n16. Choose Add SQL and enter the following code to split the dataset into training and validation datasets\n\nSQL\n```\\n\\nCREATE TABLE training_data AS \\n\\tSELECT eventname, qtysold, saletime, day, week, month, qtr, year, holiday\\n\\tFROM event\\n\\tWHERE r >\\n\\t(SELECT COUNT(1) * 0.2 FROM event);\\n\\n\\tCREATE TABLE validation_data AS \\n\\tSELECT eventname, qtysold, saletime, day, week, month, qtr, year, holiday\\n\\tFROM event\\n\\tWHERE r <=\\n\\t(SELECT COUNT(1) * 0.2 FROM event);\\n\\n```\n\n17. Choose Add markdown and enter Create ML model.\n18. Choose Add SQL and enter the following command to create the model. Replace the your_s3_bucket parameter with your bucket name.\n\nSQL\n```\\n\\nCREATE MODEL predict_ticket_sold\\n\\tFROM training_data\\n\\tTARGET qtysold\\n\\tFUNCTION predict_ticket_sold\\n\\tIAM_ROLE 'default'\\n\\tPROBLEM_TYPE regression\\n\\tOBJECTIVE 'mse'\\n\\tSETTINGS (s3_bucket 'your_s3_bucket',\\n\\ts3_garbage_collect off,\\n\\tmax_runtime 5000);\\n\\n```\n\n**Note:** It can take up to two hours to create and train the model.\nThe following screenshot shows the example output from adding our markdown and SQL.\n\n![image.png](https://dev-media.amazoncloud.cn/9313271bf9bd4a14b5897385e07845b5_image.png)\n\n19. Choose Add markdown and enter Show model creation status. Continue to next step once the Model State has changed to Ready.\n20. Choose Add SQL and enter the following command to get the status of the model creation:\n\nSQL\n```\\nSHOW MODEL predict_ticket_sold;\\n\\n\\n```\n\n![image.png](https://dev-media.amazoncloud.cn/b39ae474a5544b1d95880c714fcf6aa8_image.png)\n\nMove to the next step after the Model State has changed to READY.\n\n21. Choose Add markdown and enter Run the inference for eventname Jason Mraz.\n22. When the model is ready, you can use the SQL function to apply the ML model to your data. The following is sample SQL code to predict the tickets sold for a particular event using the predict_ticket_sold function created in the previous step:\n\nSQL\n```\\nSELECT eventname,\\n\\tpredict_ticket_sold(\\n\\teventname, saletime, day, week, month, qtr, year, holiday ) AS predicted_qty_sold,\\n\\tday, week, month\\n\\tFROM event\\n\\tWhere eventname = 'Jason Mraz';\\n\\n```\n\nThe following is the output received by applying the ML function predict_ticket_sold on the original dataset. The output of the ML function is captured in the field predicted_qty_sold, which is the predicted ticket sold quantity.\n\n![image.png](https://dev-media.amazoncloud.cn/b9443aed3cd841a088bf229e0e8230bc_image.png)\n\n#### **Share notebooks**\n\nTo share the notebooks, complete the following steps:\n\n1. Create an IAM role with the managed policy AmazonRedshiftQueryEditorV2FullAccess attached to the role.\n2. Add a principal tag to the role with the tag name sqlworkbench-team.\n3. Set the value of this tag to the principal (user, group, or role) you’re granting access to.\n4. After you configure these permissions, navigate to the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) console and choose Query editor v2 in the navigation pane. If you haven’t used the [query editor v2](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-using.html) before, please [configure](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-getting-started.html) your account to use query editor v2.\n5. Choose **Notebooks**in the left pane and navigate to My **notebooks**.\n\n![image.png](https://dev-media.amazoncloud.cn/5b274c670be848a4921a5ff607d79e5f_image.png)\n\n\n6. Right-click on the notebook you want to share and choose **Share with my team**.\n\n![image.png](https://dev-media.amazoncloud.cn/5197016619d448048a79f3b337d40c3f_image.png)\n\n7. You can confirm that the notebook is shared by choosing Shared to my team and checking that the notebook is listed.\n\n![image.png](https://dev-media.amazoncloud.cn/eb4d8c7ed2f542e39b721615c14b56d6_image.png)\n\n\n\n#### **Summary**\n\nIn this post, we showed you how to build an end-to-end pipeline by subscribing to a public dataset through [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail), simplifying data integration and processing, and then running prediction using Redshift ML on the data.\n\nWe look forward to hearing from you about your experience. If you have questions or suggestions, please leave a comment.\n\n\n#### **About the Authors**\n\n\n![image.png](https://dev-media.amazoncloud.cn/f69d1b6b5c864f018a084a5391174c9c_image.png)\n\n**Yadgiri Pottabhathini** is a Sr. Analytics Specialist Solutions Architect. His role is to assist customers in their cloud data warehouse journey and help them evaluate and align their data analytics business objectives with [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) capabilities.\n\n![image.png](https://dev-media.amazoncloud.cn/56c28abb264445d289120ba88e1c310c_image.png)\n\n**Ekta Ahuja** is an Analytics Specialist Solutions Architect at AWS. She is passionate about helping customers build scalable and robust data and analytics solutions. Before AWS, she worked in several different data engineering and analytics roles. Outside of work, she enjoys baking, traveling, and board games.\n\n![image.png](https://dev-media.amazoncloud.cn/f531b442bf7b450f80b0a3807f0d355f_image.png)\n\n**BP Yau** is a Sr Product Manager at AWS. He is passionate about helping customers architect big data solutions to process data at scale. Before AWS, he helped Amazon.com Supply Chain Optimization Technologies migrate its Oracle data warehouse to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) and build its next generation big data analytics platform using AWS technologies.\n\n![image.png](https://dev-media.amazoncloud.cn/ffdba6a9f5a64f0fad5e65978cd8fcb9_image.png)\n\n**Srikanth Sopirala** is a Principal Analytics Specialist Solutions Architect at AWS. He is a seasoned leader with over 20 years of experience, who is passionate about helping customers build scalable data and analytics solutions to gain timely insights and make critical business decisions. In his spare time, he enjoys reading, spending time with his family, and road cycling.","render":"<a href=\\"https://aws.amazon.com/redshift/features/redshift-ml/\\" target=\\"_blank\\">Amazon Redshift ML</a> makes it easy for SQL users to create, train, and deploy ML models using familiar SQL commands. Redshift ML allows you to use your data in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) with <a href=\\"https://aws.amazon.com/sagemaker/\\" target=\\"_blank\\">Amazon SageMaker</a>, a fully managed ML service, without requiring you to become an expert in ML. \\n<a href=\\"https://aws.amazon.com/data-exchange/\\" target=\\"_blank\\">AWS Data Exchange</a> makes it easy to find, subscribe to, and use third-party data in the cloud.<a href=\\"https://aws.amazon.com/redshift/features/aws-data-exchange-for-amazon-redshift/\\" target=\\"_blank\\">AWS Data Exchange for Amazon Redshift</a> enables you to access and query tables in [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) without extracting, transforming, and loading files (ETL). \\nAs a subscriber, you can browse through the AWS Data Exchange catalog and find data products that are relevant to your business with data stored in Amazon Redshift, and <a href=\\"https://aws.amazon.com/blogs/aws/new-aws-data-exchange-for-amazon-redshift/\\" target=\\"_blank\\">subscribe</a> to the data from the providers without any further processing, and no need for an ETL process.\\nIf the provider data is not already available in Amazon Redshift, many providers will add the data to Amazon Redshift upon request.\nIn this post, we show you the process of subscribing to datasets through AWS Data Exchange without ETL, running ML algorithms on an Amazon Redshift cluster, and performing local inference and production.\n<h4><a id=\\"Solution_overview_9\\"></a>Solution overview</h4>\\nThe use case for the solution in this post is to predict ticket sales for worldwide events based on historical ticket sales data using a regression model. The data or ETL engineer can build the data pipeline by subscribing to the <a href=\\"https://aws.amazon.com/marketplace/pp/prodview-4ozlpl4r3k7cg?sr=0-3&ref_=beagle&applicationId=AWSMPContessa\\" target=\\"_blank\\">Worldwide Event Attendance </a>product on [AWS Data Exchange](https://aws.amazon.com/cn/data-exchange/?trk=cndc-detail) without ETL. You can then create the ML model in Redshift ML using the time series ticket sales data and predict future ticket sales.\\n<ol>\\n<li>To implement this solution, you complete the following high-level steps:</li>\n<li>Subscribe to datasets using AWS Data Exchange for Amazon Redshift.</li>\n<li>Connect to the datashare in Amazon Redshift.</li>\n<li>Create the ML model using the SQL notebook feature of the Amazon Redshift query editor V2.</li>\n</ol>\\nThe following diagram illustrates the solution architecture.\n<img src=\\"https://dev-media.amazoncloud.cn/81b6764d7bf3412db64709a89a48effd_image.png\\" alt=\\"image.png\\" />\n<h4><a id=\\"Prerequisites_22\\"></a>Prerequisites</h4>\\nBefore starting this walkthrough, you must complete the following prerequisites:\n<ol>\\n<li>Make sure you have an existing Amazon Redshift cluster with RA3 node type. If not, you can <a href=\\"https://docs.aws.amazon.com/redshift/latest/gsg/rs-gsg-launch-sample-cluster.html\\" target=\\"_blank\\">create a provisioned Amazon Redshift cluster.</a></li>\\n<li>Make sure the Amazon Redshift cluster is encrypted, because the data provider is encrypted and Amazon Redshift data sharing requires homogeneous encryption configurations. For more details on homogeneous encryption, refer to <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/considerations.html\\" target=\\"_blank\\">Data sharing considerations in Amazon Redshift</a>.</li>\\n<li>Create an AWS Identity Access and Management (IAM) role with access to SageMaker and <a href=\\"http://aws.amazon.com/s3\\" target=\\"_blank\\">Amazon Simple Storage Service</a> ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)) and attach it to the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster.Refer to <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/admin-setup.html#cluster-setup\\" target=\\"_blank\\"> Cluster setup</a> for using [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) ML for more details.</li>\\n<li>Create an S3 bucket for storing the training data and model output.</li>\n</ol>\\nPlease note that some of the above AWS resources in this walkthrough will incur charges. Please remember to delete the resources when you’re finished.\n<h4><a id=\\"Subscribe_to_an_AWS_Data_Exchange_product_with_Amazon_Redshift_data_34\\"></a>Subscribe to an AWS Data Exchange product with Amazon Redshift data</h4>\\nTo subscribe to an AWS Data Exchange public dataset, complete the following steps:\n<ol>\\n<li>On the AWS Data Exchange console, choose Explore available data products.</li>\n<li>In the navigation pane, under Data available through, select Amazon Redshift to filter products with Amazon Redshift data.</li>\n<li>Choose Worldwide Event Attendance (Test Product).</li>\n</ol>\\n<img src=\\"https://dev-media.amazoncloud.cn/935b7363db784c11ac5997df63935d46_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"4\\">\\n<li>Choose Continue to subscribe.</li>\n<li>Confirm the catalog is subscribed by checking that it’s listed on the Subscriptions page.</li>\n</ol>\\n<img src=\\"https://dev-media.amazoncloud.cn/2457b60b1a7d41ac9340a775a05d33b9_image.png\\" alt=\\"image.png\\" />\n<h4><a id=\\"Predict_tickets_sold_using_Redshift_ML_52\\"></a>Predict tickets sold using Redshift ML</h4>\\nTo set up prediction using Redshift ML, complete the following steps:\n<ol>\\n<li>On the Amazon Redshift console, choose Datashares in the navigation pane.</li>\n<li>On the Subscriptions tab, confirm that the AWS Data Exchange datashare is available.</li>\n</ol>\\n<img src=\\"https://dev-media.amazoncloud.cn/1993c79f3c964435b674878683359e3f_image.png\\" alt=\\"image.png\\" />\n<img src=\\"https://dev-media.amazoncloud.cn/ccbcffee4e3f4d3aabb5a487428ef310_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"3\\">\\n<li>In the navigation pane, choose Query editor v2.</li>\n<li>Connect to your Amazon Redshift cluster in the navigation pane.</li>\n</ol>\\nAmazon Redshift provides a feature to create notebooks and run your queries; the notebook feature is currently in preview and available in all Regions. For the remaining part of this tutorial, we run queries in the notebook and create comments in the markdown cell for each step.\n5.Choose the plus sign and choose Notebook.\n<img src=\\"https://dev-media.amazoncloud.cn/0af2f11cea024edebefdac01a9923689_image.png\\" alt=\\"image.png\\" />\n6.Choose Add markdown.\n<img src=\\"https://dev-media.amazoncloud.cn/99c81aee3bd84736ae6b4899bc67e4de_image.png\\" alt=\\"image.png\\" />\n7.Enter Show available data share in the cell.\n8.Choose Add SQL.\n<img src=\\"https://dev-media.amazoncloud.cn/a69c3f369a0d45ae84e71e89040b2405_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"9\\">\\n<li>Enter and run the following command to see the available datashares for the cluster:</li>\n</ol>\\nSQL\n<pre><code class=\\"lang-\\">SHOW datashares; \\n</code></pre>\\nYou should be able to see worldwide_event_test_data, as shown in the following screenshot.\n<ol start=\\"10\\">\\n<li>Note the producer_namespace and producer_account values from the output, which we use in the next step.</li>\n</ol>\\n<img src=\\"https://dev-media.amazoncloud.cn/c9b4295d333c48a2a4d179116528c2fa_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"11\\">\\n<li>Choose Add markdown and enter</li>\n</ol>\\n<pre><code class=\\"lang-\\">Create database from datashare with producer_namespace and procedure_account.\\n</code></pre>\\n<ol start=\\"12\\">\\n<li>Choose Add SQL and enter the following code to <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_DATABASE.html\\" target=\\"_blank\\">create a database</a> to access the datashare. Use the producer_namespace and producer_account values you copied earlier.</li>\\n</ol>\nSQL\n<pre><code class=\\"lang-\\">\\n\\nCREATE DATABASE ml_blog_db FROM DATASHARE worldwide_event_test_data OF ACCOUNT 'producer_account' NAMESPACE 'producer_namespace';\\n</code></pre>\\n<ol start=\\"13\\">\\n<li>\\nChoose Add markdown and enter Create new table to consolidate features.\n</li>\\n<li>\\nChoose Add SQL and enter the following code to create a new table called event consisting of the event sales by date and assign a running serial number to split into training and validation datasets:\n</li>\\n</ol>\nSQL\n<pre><code class=\\"lang-\\">\\nCREATE TABLE event AS\\n\\tSELECT eventname, qtysold, saletime, day, week, month, qtr, year, holiday, ROW_NUMBER() OVER (ORDER BY RANDOM()) r\\n\\tFROM "ml_blog_db"."public"."sales" s\\n\\tINNER JOIN "ml_blog_db"."public"."event" e\\n\\tON s.eventid = e.eventid\\n\\tINNER JOIN "ml_blog_db"."public"."date" d\\n\\tON s.dateid = d.dateid;\\n\\n</code></pre>\\nEvent name, quantity sold, sale time, day, week, month, quarter, year, and holiday are columns in the dataset from AWS Data Exchange that are used as features in the ML model creation.\n<ol start=\\"15\\">\\n<li>Choose Add markdown and enter Split the dataset into training dataset and validation dataset.</li>\n<li>Choose Add SQL and enter the following code to split the dataset into training and validation datasets</li>\n</ol>\\nSQL\n<pre><code class=\\"lang-\\">\\nCREATE TABLE training_data AS \\n\\tSELECT eventname, qtysold, saletime, day, week, month, qtr, year, holiday\\n\\tFROM event\\n\\tWHERE r >\\n\\t(SELECT COUNT(1) * 0.2 FROM event);\\n\\n\\tCREATE TABLE validation_data AS \\n\\tSELECT eventname, qtysold, saletime, day, week, month, qtr, year, holiday\\n\\tFROM event\\n\\tWHERE r <=\\n\\t(SELECT COUNT(1) * 0.2 FROM event);\\n\\n</code></pre>\\n<ol start=\\"17\\">\\n<li>Choose Add markdown and enter Create ML model.</li>\n<li>Choose Add SQL and enter the following command to create the model. Replace the your_s3_bucket parameter with your bucket name.</li>\n</ol>\\nSQL\n<pre><code class=\\"lang-\\">\\nCREATE MODEL predict_ticket_sold\\n\\tFROM training_data\\n\\tTARGET qtysold\\n\\tFUNCTION predict_ticket_sold\\n\\tIAM_ROLE 'default'\\n\\tPROBLEM_TYPE regression\\n\\tOBJECTIVE 'mse'\\n\\tSETTINGS (s3_bucket 'your_s3_bucket',\\n\\ts3_garbage_collect off,\\n\\tmax_runtime 5000);\\n\\n</code></pre>\\nNote: It can take up to two hours to create and train the model. \\nThe following screenshot shows the example output from adding our markdown and SQL.\n<img src=\\"https://dev-media.amazoncloud.cn/9313271bf9bd4a14b5897385e07845b5_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"19\\">\\n<li>Choose Add markdown and enter Show model creation status. Continue to next step once the Model State has changed to Ready.</li>\n<li>Choose Add SQL and enter the following command to get the status of the model creation:</li>\n</ol>\\nSQL\n<pre><code class=\\"lang-\\">SHOW MODEL predict_ticket_sold;\\n\\n\\n</code></pre>\\n<img src=\\"https://dev-media.amazoncloud.cn/b39ae474a5544b1d95880c714fcf6aa8_image.png\\" alt=\\"image.png\\" />\nMove to the next step after the Model State has changed to READY.\n<ol start=\\"21\\">\\n<li>Choose Add markdown and enter Run the inference for eventname Jason Mraz.</li>\n<li>When the model is ready, you can use the SQL function to apply the ML model to your data. The following is sample SQL code to predict the tickets sold for a particular event using the predict_ticket_sold function created in the previous step:</li>\n</ol>\\nSQL\n<pre><code class=\\"lang-\\">SELECT eventname,\\n\\tpredict_ticket_sold(\\n\\teventname, saletime, day, week, month, qtr, year, holiday ) AS predicted_qty_sold,\\n\\tday, week, month\\n\\tFROM event\\n\\tWhere eventname = 'Jason Mraz';\\n\\n</code></pre>\\nThe following is the output received by applying the ML function predict_ticket_sold on the original dataset. The output of the ML function is captured in the field predicted_qty_sold, which is the predicted ticket sold quantity.\n<img src=\\"https://dev-media.amazoncloud.cn/b9443aed3cd841a088bf229e0e8230bc_image.png\\" alt=\\"image.png\\" />\n<h4><a id=\\"Share_notebooks_206\\"></a>Share notebooks</h4>\\nTo share the notebooks, complete the following steps:\n<ol>\\n<li>Create an IAM role with the managed policy AmazonRedshiftQueryEditorV2FullAccess attached to the role.</li>\n<li>Add a principal tag to the role with the tag name sqlworkbench-team.</li>\n<li>Set the value of this tag to the principal (user, group, or role) you’re granting access to.</li>\n<li>After you configure these permissions, navigate to the Amazon Redshift console and choose Query editor v2 in the navigation pane. If you haven’t used the <a href=\\"https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-using.html\\" target=\\"_blank\\">query editor v2</a> before, please <a href=\\"https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-getting-started.html\\" target=\\"_blank\\">configure</a> your account to use query editor v2.</li>\\n<li>Choose Notebooksin the left pane and navigate to My notebooks.</li>\\n</ol>\n<img src=\\"https://dev-media.amazoncloud.cn/5b274c670be848a4921a5ff607d79e5f_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"6\\">\\n<li>Right-click on the notebook you want to share and choose Share with my team.</li>\\n</ol>\n<img src=\\"https://dev-media.amazoncloud.cn/5197016619d448048a79f3b337d40c3f_image.png\\" alt=\\"image.png\\" />\n<ol start=\\"7\\">\\n<li>You can confirm that the notebook is shared by choosing Shared to my team and checking that the notebook is listed.</li>\n</ol>\\n<img src=\\"https://dev-media.amazoncloud.cn/eb4d8c7ed2f542e39b721615c14b56d6_image.png\\" alt=\\"image.png\\" />\n<h4><a id=\\"Summary_229\\"></a>Summary</h4>\\nIn this post, we showed you how to build an end-to-end pipeline by subscribing to a public dataset through AWS Data Exchange, simplifying data integration and processing, and then running prediction using Redshift ML on the data.\nWe look forward to hearing from you about your experience. If you have questions or suggestions, please leave a comment.\n<h4><a id=\\"About_the_Authors_236\\"></a>About the Authors</h4>\\n<img src=\\"https://dev-media.amazoncloud.cn/f69d1b6b5c864f018a084a5391174c9c_image.png\\" alt=\\"image.png\\" />\nYadgiri Pottabhathini is a Sr. Analytics Specialist Solutions Architect. His role is to assist customers in their cloud data warehouse journey and help them evaluate and align their data analytics business objectives with [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) capabilities.\\n<img src=\\"https://dev-media.amazoncloud.cn/56c28abb264445d289120ba88e1c310c_image.png\\" alt=\\"image.png\\" />\nEkta Ahuja is an Analytics Specialist Solutions Architect at AWS. She is passionate about helping customers build scalable and robust data and analytics solutions. Before AWS, she worked in several different data engineering and analytics roles. Outside of work, she enjoys baking, traveling, and board games.\\n<img src=\\"https://dev-media.amazoncloud.cn/f531b442bf7b450f80b0a3807f0d355f_image.png\\" alt=\\"image.png\\" />\nBP Yau is a Sr Product Manager at AWS. He is passionate about helping customers architect big data solutions to process data at scale. Before AWS, he helped Amazon.com Supply Chain Optimization Technologies migrate its Oracle data warehouse to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) and build its next generation big data analytics platform using AWS technologies.\\n<img src=\\"https://dev-media.amazoncloud.cn/ffdba6a9f5a64f0fad5e65978cd8fcb9_image.png\\" alt=\\"image.png\\" />\nSrikanth Sopirala is a Principal Analytics Specialist Solutions Architect at AWS. He is a seasoned leader with over 20 years of experience, who is passionate about helping customers build scalable data and analytics solutions to gain timely insights and make critical business decisions. In his spare time, he enjoys reading, spending time with his family, and road cycling.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家