New built-in Amazon SageMaker algorithms for tabular data modeling: LightGBM, CatBoost, AutoGluon-Tabular, and TabTransformer

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"[Amazon SageMaker](https://aws.amazon.com/sagemaker/) provides a suite of [built-in algorithms](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html), [pre-trained models](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html#jumpstart-solutions), and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.\n\nStarting today, SageMaker provides four new built-in tabular data modeling algorithms: LightGBM, CatBoost, AutoGluon-Tabular, and TabTransformer. You can use these popular, state-of-the-art algorithms for both tabular classification and regression tasks. They’re available through the [built-in algorithms](https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html) on the SageMaker console as well as through the [Amazon SageMaker JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html) UI inside [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html).\n\nThe following is the list of the four new built-in algorithms, with links to their documentation, example notebooks, and source.\n\n![微信图片_20220830130411.png](1)\n\nIn the following sections, we provide a brief technical description of each algorithm, and examples of how to train a model via the SageMaker SDK or SageMaker Jumpstart.\n\n\n#### **LightGBM**\n\n\n[LightGBM](https://lightgbm.readthedocs.io/en/latest/) is a popular and efficient open-source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve the efficiency and scalability of conventional GBDT.\n\n\n#### **CatBoost**\n\n\n[CatBoost](https://catboost.ai/) is a popular and high-performance open-source implementation of the GBDT algorithm. Two critical algorithmic advances are introduced in CatBoost: the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms.\n\n\n#### **AutoGluon-Tabular**\n\n\n[AutoGluon-Tabular](https://auto.gluon.ai/stable/index.html) is an open-source AutoML project developed and maintained by Amazon that performs advanced data processing, deep learning, and multi-layer stack ensembling. It automatically recognizes the data type in each column for robust data preprocessing, including special handling of text fields. AutoGluon fits various models ranging from off-the-shelf boosted trees to customized neural network models. These models are ensembled in a novel way: models are stacked in multiple layers and trained in a layer-wise manner that guarantees raw data can be translated into high-quality predictions within a given time constraint. Over-fitting is mitigated throughout this process by splitting the data in various ways with careful tracking of out-of-fold examples. AutoGluon is optimized for performance, and its out-of-the-box usage has achieved several top-3 and top-10 positions in data science competitions.\n\n\n#### **TabTransformer**\n\n\n[TabTransformer](https://www.amazon.science/blog/bringing-the-power-of-deep-learning-to-data-in-tables) is a novel deep tabular data modelling architecture for supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. This model is the product of recent [Amazon Science](https://www.amazon.science/) research ([paper](https://arxiv.org/pdf/2012.06678.pdf) and official [blog post](https://www.amazon.science/blog/bringing-the-power-of-deep-learning-to-data-in-tables) here) and has been widely adopted by the ML community, with various third-party implementations ([Keras](https://keras.io/examples/structured_data/tabtransformer/), [AutoGluon](https://github.com/awslabs/autogluon/tree/master/tabular/src/autogluon/tabular/models/tab_transformer),) and social media features such as [tweets](https://twitter.com/fchollet/status/1484212136846921730), [towardsdatascience](https://towardsdatascience.com/pytorch-widedeep-deep-learning-for-tabular-data-9cd1c48eb40d), medium, and [Kaggle](https://www.kaggle.com/code/usharengaraju/tensorflow-tabtransformer/notebook).\n\n\n#### **Benefits of SageMaker built-in algorithms**\n\n\nWhen selecting an algorithm for your particular type of problem and data, using a SageMaker built-in algorithm is the easiest option, because doing so comes with the following major benefits:\n\n- The built-in algorithms require no coding to start running experiments. The only inputs you need to provide are the data, hyperparameters, and compute resources. This allows you to run experiments more quickly, with less overhead for tracking results and code changes.\n- The built-in algorithms come with parallelization across multiple compute instances and GPU support right out of the box for all applicable algorithms (some algorithms may not be included due to inherent limitations). If you have a lot of data with which to train your model, most built-in algorithms can easily scale to meet the demand. Even if you already have a pre-trained model, it may still be easier to use its corollary in SageMaker and input the hyperparameters you already know rather than port it over and write a training script yourself.\n- You are the owner of the resulting model artifacts. You can take that model and deploy it on SageMaker for several different inference patterns (check out all the [available deployment types](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-cost-optimization.html)) and easy endpoint scaling and management, or you can deploy it wherever else you need it.\n\nLet’s now see how to train one of these built-in algorithms.\n\n\n#### **Train a built-in algorithm using the SageMaker SDK**\n\n\nTo train a selected model, we need to get that model’s URI, as well as that of the training script and the container image used for training. Thankfully, these three inputs depend solely on the model name, version (for a list of the available models, see [JumpStart Available Model Table](https://sagemaker.readthedocs.io/en/stable/doc_utils/jumpstart.html)), and the type of instance you want to train on. This is demonstrated in the following code snippet:\n\n```\\nfrom sagemaker import image_uris, model_uris, script_uris\\n\\ntrain_model_id, train_model_version, train_scope = \\"lightgbm-classification-model\\", \\"*\\", \\"training\\"\\ntraining_instance_type = \\"ml.m5.xlarge\\"\\n\\n# Retrieve the docker image\\ntrain_image_uri = image_uris.retrieve(\\n region=None,\\n framework=None,\\n model_id=train_model_id,\\n model_version=train_model_version,\\n image_scope=train_scope,\\n instance_type=training_instance_type\\n)\\n# Retrieve the training script\\ntrain_source_uri = script_uris.retrieve(\\n model_id=train_model_id, model_version=train_model_version, script_scope=train_scope\\n)\\n# Retrieve the model artifact; in the tabular case, the model is not pre-trained \\ntrain_model_uri = model_uris.retrieve(\\n model_id=train_model_id, model_version=train_model_version, model_scope=train_scope\\n)\\n```\n\nThe ```train_model_id``` changes to ```lightgbm-regression-model``` if we’re dealing with a regression problem. The IDs for all the other models introduced in this post are listed in the following table.\n\n![微信图片_20220830131105.png](2)\n\nWe then define where our input is on [Amazon Simple Storage Service](http://aws.amazon.com/s3) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)). We’re using a public sample dataset for this example. We also define where we want our output to go, and retrieve the default list of hyperparameters needed to train the selected model. You can change their value to your liking.\n\n```\\nimport sagemaker\\nfrom sagemaker import hyperparameters\\n\\nsess = sagemaker.Session()\\nregion = sess.boto_session.region_name\\n\\n# URI of sample training dataset\\ntraining_dataset_s3_path = f\\"s3:///jumpstart-cache-prod-{region}/training-datasets/tabular_multiclass/\\"\\n\\n# URI for output artifacts \\noutput_bucket = sess.default_bucket()\\ns3_output_location = f\\"s3://{output_bucket}/jumpstart-example-tabular-training/output\\"\\n\\n# Retrieve the default hyper-parameters for training\\nhyperparameters = hyperparameters.retrieve_default(\\n model_id=train_model_id, model_version=train_model_version\\n)\\n\\n# [Optional] Override default hyperparameters with custom values\\nhyperparameters[\\n \\"num_boost_round\\"\\n] = \\"500\\" # The same hyperparameter is named as \\"iterations\\" for CatBoost\\n```\n\nFinally, we instantiate a SageMaker ```Estimator``` with all the retrieved inputs and launch the training job with ```.fit```, passing it our training dataset URI. The ```entry_point``` script provided is named ```transfer_learning.py``` (the same for other tasks and algorithms), and the input data channel passed to ```.fit``` must be named ```training```.\n\n```\\nfrom sagemaker.estimator import Estimator\\nfrom sagemaker.utils import name_from_base\\n\\n# Unique training job name\\ntraining_job_name = name_from_base(f\\"built-in-example-{model_id}\\")\\n\\n# Create SageMaker Estimator instance\\ntc_estimator = Estimator(\\n role=aws_role,\\n image_uri=train_image_uri,\\n source_dir=train_source_uri,\\n model_uri=train_model_uri,\\n entry_point=\\"transfer_learning.py\\",\\n instance_count=1,\\n instance_type=training_instance_type,\\n max_run=360000,\\n hyperparameters=hyperparameters,\\n output_path=s3_output_location,\\n)\\n\\n# Launch a SageMaker Training job by passing s3 path of the training data\\ntc_estimator.fit({\\"training\\": training_dataset_s3_path}, logs=True)\\n```\n\nNote that you can train built-in algorithms with [SageMaker automatic model tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) to select the optimal hyperparameters and further improve model performance.\n\n\n#### **Train a built-in algorithm using SageMaker JumpStart**\n\n\nYou can also train any these built-in algorithms with a few clicks via the SageMaker JumpStart UI. JumpStart is a SageMaker feature that allows you to train and deploy built-in algorithms and pre-trained models from various ML frameworks and model hubs through a graphical interface. It also allows you to deploy fully fledged ML solutions that string together ML models and various other AWS services to solve a targeted use case.\n\nFor more information, refer to [Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models](https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/).\n\n\n#### **Conclusion**\n\n\nIn this post, we announced the launch of four powerful new built-in algorithms for ML on tabular datasets now available on SageMaker. We provided a technical description of what these algorithms are, as well as an example training job for LightGBM using the SageMaker SDK.\n\nBring your own dataset and try these new algorithms on SageMaker, and check out the sample notebooks to use built-in algorithms available on [GitHub](https://github.com/aws/amazon-sagemaker-examples/tree/main/introduction_to_amazon_algorithms).\n\n\n##### **About the Authors**\n\n\n[![image.png](3)](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/06/28/xinhuang-selfie.jpg)\n\n**Dr. Xin Huang** is an Applied Scientist for [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) JumpStart and [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.\n\n[![image.png](4)](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/05/10/Ashish-Khetan.jpg)\n\n**Dr. Ashish Khetan** is a Senior Applied Scientist with [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) JumpStart and [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) built-in algorithms and helps develop machine learning algorithms. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.\n\n[![image.png](5)](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/05/10/Jo%C3%A3o-Moura.jpg)\n\n**João Moura** is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use-cases and helping customers optimize Deep Learning model training and deployment. He is also an active proponent of low-code ML solutions and ML-specialized hardware.","render":"<p><a href=\\"https://aws.amazon.com/sagemaker/\\" target=\\"_blank\\">Amazon SageMaker</a> provides a suite of <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html\\" target=\\"_blank\\">built-in algorithms</a>, <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html#jumpstart-solutions\\" target=\\"_blank\\">pre-trained models</a>, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.</p>\\n<p>Starting today, SageMaker provides four new built-in tabular data modeling algorithms: LightGBM, CatBoost, AutoGluon-Tabular, and TabTransformer. You can use these popular, state-of-the-art algorithms for both tabular classification and regression tasks. They’re available through the <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html\\" target=\\"_blank\\">built-in algorithms</a> on the SageMaker console as well as through the <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html\\" target=\\"_blank\\">Amazon SageMaker JumpStart</a> UI inside <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html\\" target=\\"_blank\\">Amazon SageMaker Studio</a>.</p>\\n<p>The following is the list of the four new built-in algorithms, with links to their documentation, example notebooks, and source.</p>\n<p><img src=\\"\\" alt=\\"微信图片_20220830130411.png\\" rel=\\"1\\" /></p>\n<p>In the following sections, we provide a brief technical description of each algorithm, and examples of how to train a model via the SageMaker SDK or SageMaker Jumpstart.</p>\n<h4><a id=\\"LightGBM_11\\"></a><strong>LightGBM</strong></h4>\\n<p><a href=\\"https://lightgbm.readthedocs.io/en/latest/\\" target=\\"_blank\\">LightGBM</a> is a popular and efficient open-source implementation of the Gradient Boosting Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve the efficiency and scalability of conventional GBDT.</p>\\n<h4><a id=\\"CatBoost_17\\"></a><strong>CatBoost</strong></h4>\\n<p><a href=\\"https://catboost.ai/\\" target=\\"_blank\\">CatBoost</a> is a popular and high-performance open-source implementation of the GBDT algorithm. Two critical algorithmic advances are introduced in CatBoost: the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms.</p>\\n<h4><a id=\\"AutoGluonTabular_23\\"></a><strong>AutoGluon-Tabular</strong></h4>\\n<p><a href=\\"https://auto.gluon.ai/stable/index.html\\" target=\\"_blank\\">AutoGluon-Tabular</a> is an open-source AutoML project developed and maintained by Amazon that performs advanced data processing, deep learning, and multi-layer stack ensembling. It automatically recognizes the data type in each column for robust data preprocessing, including special handling of text fields. AutoGluon fits various models ranging from off-the-shelf boosted trees to customized neural network models. These models are ensembled in a novel way: models are stacked in multiple layers and trained in a layer-wise manner that guarantees raw data can be translated into high-quality predictions within a given time constraint. Over-fitting is mitigated throughout this process by splitting the data in various ways with careful tracking of out-of-fold examples. AutoGluon is optimized for performance, and its out-of-the-box usage has achieved several top-3 and top-10 positions in data science competitions.</p>\\n<h4><a id=\\"TabTransformer_29\\"></a><strong>TabTransformer</strong></h4>\\n<p><a href=\\"https://www.amazon.science/blog/bringing-the-power-of-deep-learning-to-data-in-tables\\" target=\\"_blank\\">TabTransformer</a> is a novel deep tabular data modelling architecture for supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. This model is the product of recent <a href=\\"https://www.amazon.science/\\" target=\\"_blank\\">Amazon Science</a> research (<a href=\\"https://arxiv.org/pdf/2012.06678.pdf\\" target=\\"_blank\\">paper</a> and official <a href=\\"https://www.amazon.science/blog/bringing-the-power-of-deep-learning-to-data-in-tables\\" target=\\"_blank\\">blog post</a> here) and has been widely adopted by the ML community, with various third-party implementations (<a href=\\"https://keras.io/examples/structured_data/tabtransformer/\\" target=\\"_blank\\">Keras</a>, <a href=\\"https://github.com/awslabs/autogluon/tree/master/tabular/src/autogluon/tabular/models/tab_transformer\\" target=\\"_blank\\">AutoGluon</a>,) and social media features such as <a href=\\"https://twitter.com/fchollet/status/1484212136846921730\\" target=\\"_blank\\">tweets</a>, <a href=\\"https://towardsdatascience.com/pytorch-widedeep-deep-learning-for-tabular-data-9cd1c48eb40d\\" target=\\"_blank\\">towardsdatascience</a>, medium, and <a href=\\"https://www.kaggle.com/code/usharengaraju/tensorflow-tabtransformer/notebook\\" target=\\"_blank\\">Kaggle</a>.</p>\\n<h4><a id=\\"Benefits_of_SageMaker_builtin_algorithms_35\\"></a><strong>Benefits of SageMaker built-in algorithms</strong></h4>\\n<p>When selecting an algorithm for your particular type of problem and data, using a SageMaker built-in algorithm is the easiest option, because doing so comes with the following major benefits:</p>\n<ul>\\n<li>The built-in algorithms require no coding to start running experiments. The only inputs you need to provide are the data, hyperparameters, and compute resources. This allows you to run experiments more quickly, with less overhead for tracking results and code changes.</li>\n<li>The built-in algorithms come with parallelization across multiple compute instances and GPU support right out of the box for all applicable algorithms (some algorithms may not be included due to inherent limitations). If you have a lot of data with which to train your model, most built-in algorithms can easily scale to meet the demand. Even if you already have a pre-trained model, it may still be easier to use its corollary in SageMaker and input the hyperparameters you already know rather than port it over and write a training script yourself.</li>\n<li>You are the owner of the resulting model artifacts. You can take that model and deploy it on SageMaker for several different inference patterns (check out all the <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/inference-cost-optimization.html\\" target=\\"_blank\\">available deployment types</a>) and easy endpoint scaling and management, or you can deploy it wherever else you need it.</li>\\n</ul>\n<p>Let’s now see how to train one of these built-in algorithms.</p>\n<h4><a id=\\"Train_a_builtin_algorithm_using_the_SageMaker_SDK_47\\"></a><strong>Train a built-in algorithm using the SageMaker SDK</strong></h4>\\n<p>To train a selected model, we need to get that model’s URI, as well as that of the training script and the container image used for training. Thankfully, these three inputs depend solely on the model name, version (for a list of the available models, see <a href=\\"https://sagemaker.readthedocs.io/en/stable/doc_utils/jumpstart.html\\" target=\\"_blank\\">JumpStart Available Model Table</a>), and the type of instance you want to train on. This is demonstrated in the following code snippet:</p>\\n<pre><code class=\\"lang-\\">from sagemaker import image_uris, model_uris, script_uris\\n\\ntrain_model_id, train_model_version, train_scope = &quot;lightgbm-classification-model&quot;, &quot;*&quot;, &quot;training&quot;\\ntraining_instance_type = &quot;ml.m5.xlarge&quot;\\n\\n# Retrieve the docker image\\ntrain_image_uri = image_uris.retrieve(\\n region=None,\\n framework=None,\\n model_id=train_model_id,\\n model_version=train_model_version,\\n image_scope=train_scope,\\n instance_type=training_instance_type\\n)\\n# Retrieve the training script\\ntrain_source_uri = script_uris.retrieve(\\n model_id=train_model_id, model_version=train_model_version, script_scope=train_scope\\n)\\n# Retrieve the model artifact; in the tabular case, the model is not pre-trained \\ntrain_model_uri = model_uris.retrieve(\\n model_id=train_model_id, model_version=train_model_version, model_scope=train_scope\\n)\\n</code></pre>\\n<p>The <code>train_model_id</code> changes to <code>lightgbm-regression-model</code> if we’re dealing with a regression problem. The IDs for all the other models introduced in this post are listed in the following table.</p>\\n<p><img src=\\"\\" alt=\\"微信图片_20220830131105.png\\" rel=\\"2\\" /></p>\n<p>We then define where our input is on <a href=\\"http://aws.amazon.com/s3\\" target=\\"_blank\\">Amazon Simple Storage Service</a> ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)). We’re using a public sample dataset for this example. We also define where we want our output to go, and retrieve the default list of hyperparameters needed to train the selected model. You can change their value to your liking.</p>\\n<pre><code class=\\"lang-\\">import sagemaker\\nfrom sagemaker import hyperparameters\\n\\nsess = sagemaker.Session()\\nregion = sess.boto_session.region_name\\n\\n# URI of sample training dataset\\ntraining_dataset_s3_path = f&quot;s3:///jumpstart-cache-prod-{region}/training-datasets/tabular_multiclass/&quot;\\n\\n# URI for output artifacts \\noutput_bucket = sess.default_bucket()\\ns3_output_location = f&quot;s3://{output_bucket}/jumpstart-example-tabular-training/output&quot;\\n\\n# Retrieve the default hyper-parameters for training\\nhyperparameters = hyperparameters.retrieve_default(\\n model_id=train_model_id, model_version=train_model_version\\n)\\n\\n# [Optional] Override default hyperparameters with custom values\\nhyperparameters[\\n &quot;num_boost_round&quot;\\n] = &quot;500&quot; # The same hyperparameter is named as &quot;iterations&quot; for CatBoost\\n</code></pre>\\n<p>Finally, we instantiate a SageMaker <code>Estimator</code> with all the retrieved inputs and launch the training job with <code>.fit</code>, passing it our training dataset URI. The <code>entry_point</code> script provided is named <code>transfer_learning.py</code> (the same for other tasks and algorithms), and the input data channel passed to <code>.fit</code> must be named <code>training</code>.</p>\\n<pre><code class=\\"lang-\\">from sagemaker.estimator import Estimator\\nfrom sagemaker.utils import name_from_base\\n\\n# Unique training job name\\ntraining_job_name = name_from_base(f&quot;built-in-example-{model_id}&quot;)\\n\\n# Create SageMaker Estimator instance\\ntc_estimator = Estimator(\\n role=aws_role,\\n image_uri=train_image_uri,\\n source_dir=train_source_uri,\\n model_uri=train_model_uri,\\n entry_point=&quot;transfer_learning.py&quot;,\\n instance_count=1,\\n instance_type=training_instance_type,\\n max_run=360000,\\n hyperparameters=hyperparameters,\\n output_path=s3_output_location,\\n)\\n\\n# Launch a SageMaker Training job by passing s3 path of the training data\\ntc_estimator.fit({&quot;training&quot;: training_dataset_s3_path}, logs=True)\\n</code></pre>\\n<p>Note that you can train built-in algorithms with <a href=\\"https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html\\" target=\\"_blank\\">SageMaker automatic model tuning</a> to select the optimal hyperparameters and further improve model performance.</p>\\n<h4><a id=\\"Train_a_builtin_algorithm_using_SageMaker_JumpStart_138\\"></a><strong>Train a built-in algorithm using SageMaker JumpStart</strong></h4>\\n<p>You can also train any these built-in algorithms with a few clicks via the SageMaker JumpStart UI. JumpStart is a SageMaker feature that allows you to train and deploy built-in algorithms and pre-trained models from various ML frameworks and model hubs through a graphical interface. It also allows you to deploy fully fledged ML solutions that string together ML models and various other AWS services to solve a targeted use case.</p>\n<p>For more information, refer to <a href=\\"https://aws.amazon.com/blogs/machine-learning/run-text-classification-with-amazon-sagemaker-jumpstart-using-tensorflow-hub-and-huggingface-models/\\" target=\\"_blank\\">Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models</a>.</p>\\n<h4><a id=\\"Conclusion_146\\"></a><strong>Conclusion</strong></h4>\\n<p>In this post, we announced the launch of four powerful new built-in algorithms for ML on tabular datasets now available on SageMaker. We provided a technical description of what these algorithms are, as well as an example training job for LightGBM using the SageMaker SDK.</p>\n<p>Bring your own dataset and try these new algorithms on SageMaker, and check out the sample notebooks to use built-in algorithms available on <a href=\\"https://github.com/aws/amazon-sagemaker-examples/tree/main/introduction_to_amazon_algorithms\\" target=\\"_blank\\">GitHub</a>.</p>\\n<h5><a id=\\"About_the_Authors_154\\"></a><strong>About the Authors</strong></h5>\\n<p><a href=\\"https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/06/28/xinhuang-selfie.jpg\\" target=\\"_blank\\"><img src=\\"\\" alt=\\"image.png\\" rel=\\"3\\" /></a></p>\\n<p><strong>Dr. Xin Huang</strong> is an Applied Scientist for [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) JumpStart and [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.</p>\\n<p><a href=\\"https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/05/10/Ashish-Khetan.jpg\\" target=\\"_blank\\"><img src=\\"\\" alt=\\"image.png\\" rel=\\"4\\" /></a></p>\\n<p><strong>Dr. Ashish Khetan</strong> is a Senior Applied Scientist with [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) JumpStart and [Amazon SageMaker](https://aws.amazon.com/cn/sagemaker/?trk=cndc-detail) built-in algorithms and helps develop machine learning algorithms. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.</p>\\n<p><a href=\\"https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2022/05/10/Jo%C3%A3o-Moura.jpg\\" target=\\"_blank\\"><img src=\\"\\" alt=\\"image.png\\" rel=\\"5\\" /></a></p>\\n<p><strong>João Moura</strong> is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use-cases and helping customers optimize Deep Learning model training and deployment. He is also an active proponent of low-code ML solutions and ML-specialized hardware.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭