Promote feature discovery and reuse across your organization using Amazon SageMaker Feature Store and its feature-level metadata capability

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"[Amazon SageMaker Feature Store](https://aws.amazon.com/sagemaker/feature-store/) helps data scientists and machine learning (ML) engineers securely store, discover, and share curated data used in training and prediction workflows. Feature Store is a centralized store for features and associated metadata, allowing features to be easily discovered and reused by data scientist teams working on different projects or ML models.\n\nWith Feature Store, you have always been able to add metadata at the feature group level. Data scientists who want the ability to search and discover existing features for their models now have the ability to search for information at the feature level by adding custom metadata. For example, the information can include a description of the feature, the date it was last modified, its original data source, certain metrics, or the level of sensitivity.\n\nThe following diagram illustrates the architecture relationships between feature groups, features, and associated metadata. Note how data scientists can now specify descriptions and metadata at both the feature group level and the individual feature level.\n\n![image.png](https://dev-media.amazoncloud.cn/b92475646e0841e1b6c429e727c4905a_image.png)\n\nIn this post, we explain how data scientists and ML engineers can use feature-level metadata with the new search and discovery capabilities of Feature Store to promote better feature reuse across their organization. This capability can significantly help data scientists in the feature selection process and, as a result, help you identify features that lead to increased model accuracy.\n\n#### **Use case**\n\nFor the purposes of this post, we use two feature groups, ```customer``` and ``loan```.\n\nThe ```customer``` feature group has the following features:\n\n- age – Customer’s age (numeric)\n- job – Type of job (one-hot encoded, such as ```admin``` or ```services```)\n- marital – Marital status (one-hot encoded, such as ```married``` or ```single```)\n- education – Level of education (one-hot encoded, such as ```basic 4y``` or ```high school```)\n\nThe ```loan``` feature group has the following features:\n\n- **default** – Has credit in default? (one-hot encoded: ```no``` or ```yes```)\n- **housing** – Has housing loan? (one-hot encoded: ```no``` or ```yes```)\n- **loan** – Has personal loan? (one-hot encoded: ```no``` or ```yes```)\n- **total_amount** – Total amount of loans (numeric)\n\nThe following figure shows example feature groups and feature metadata.\n\n![image.png](https://dev-media.amazoncloud.cn/1d824b1c6d1b40d09276d9468575c5e9_image.png)\n\nThe purpose of adding a description and assigning metadata to each feature is to increase the speed of discovery by enabling new search parameters along which a data scientist or ML engineer can explore features. These can reflect details about a feature such as its calculation, whether it’s an average over 6 months or 1 year, origin, creator or owner, what the feature means, and more.\n\nIn the following sections, we provide two approaches to search and discover features and configure feature-level metadata: the first using [Amazon SageMaker Studio ](https://aws.amazon.com/sagemaker/studio/)directly, and the second programmatically.\n\n#### **Feature discovery in Studio**\n\nYou can easily search and query features using Studio. With the new enhanced search and discovery capabilities, you can immediately retrieve results using a simple type-ahead of a few characters.\n\nThe following screenshot demonstrates the following capabilities:\n\n- You can access the **Feature Catalog** tab and observe features across feature groups. The features are presented in a table that includes the feature name, type, description, parameters, date of creation, and associated feature group’s name.\n- You can directly use the type-ahead functionality to immediately return search results.\n- You have the flexibility to use different types of filter options: ```All```, ```Feature name```, ```Description```, or ```Parameters```. Note that ```All``` will return all features where either ```Feature name```, ```Description```, or ```Parameters``` match the search criteria.\n- You can narrow down the search further by specifying a date range using the ```Created from``` and ```Created to``` fields and specifying parameters using the ```Search parameter key``` and ```Search parameter value``` fields.\n\n![image.png](https://dev-media.amazoncloud.cn/c436b39843ba45d2be993ac34e116ca5_image.png)\n\nAfter you have selected a feature, you can choose the feature’s name to bring up its details. When you choose Edit **Metadata**, you can add a description and up to 25 key-value parameters, as shown in the following screenshot. Within this view, you can ultimately create, view, update, and delete the feature’s metadata. The following screenshot illustrates how to edit feature metadata for ```total_amount```.\n\n![下载.gif](https://dev-media.amazoncloud.cn/b2a5540826e4433cbbf15763952e4fb0_%E4%B8%8B%E8%BD%BD.gif)\n\nAs previously stated, adding key-value pairs to a feature gives you more dimensions along which to search for their given features. For our example, the feature’s origin has been added to every feature’s metadata. When you choose the search icon and filter along the key-value pair ```origin```: ```job```, you can see all the features that were one-hot-encoded from this base attribute.\n\n#### **Feature discovery using code**\n\nYou can also access and update feature information through the [AWS Command Line Interface](http://aws.amazon.com/cli) (AWS CLI) and SDK (Boto3) rather than directly through the [AWS Management Console](http://aws.amazon.com/console). This allows you to integrate the feature-level search functionality of Feature Store with your own custom data science platforms. In this section, we interact with the Boto3 API endpoints to update and search feature metadata.\n\nTo begin improving feature search and discovery, you can add metadata using the ```update_feature_metadata``` API. In addition to the ```description``` and ```created_date``` fields, you can add up to 25 parameters (key-value pairs) to a given feature.\n\nThe following code is an example of five possible key-value parameters that have been added to the ```job_admin``` feature. This feature was created, along with ```job_services``` and ```job_none```, by one-hot-encoding ```job```.\n\n```\nsagemaker_client.update_feature_metadata(\n FeatureGroupName=\"customer\",\n FeatureName=\"job_admin\",\n ParameterAdditions=[\n {\"Key\": \"author\", \"Value\": \"arnaud\"}, # Feature's author\n {\"Key\": \"team\", \"Value\": \"mlops\"}, # Team owning the feature\n {\"Key\": \"origin\", \"Value\": \"job\"}, # Raw input parameter\n {\"Key\": \"sensitivity\", \"Value\": \"5\"}, # 1-5 scale for data sensitivity\n {\"Key\": \"env\", \"Value\": \"testing\"} # Environment the feature is used in\n ]\n)\n```\n\nAfter ```author```, ```team```, ```origin```, ```sensitivity```, and ```env``` have been added to the ```job_admin``` feature, data scientists or ML engineers can retrieve them by calling the ```describe_feature_metadata``` API. You can navigate to the ```Parameters``` object in the response for the metadata we previously added to our feature. The ```describe_feature_metadata``` API endpoint allows you to get greater insight into a given feature by getting its associated metadata.\n\n```\nresponse = sagemaker_client.describe_feature_metadata(\n FeatureGroupName=\"customer\",\n FeatureName=\"job_admin\",\n)\n\n# Navigate to 'Parameters' in response to get metadata\nmetadata = response['Parameters']\n```\n\nYou can search for features by using the SageMaker ```search``` API using metadata as search parameters. The following code is an example function that takes a ```search_string``` parameter as an input and returns all features where the feature’s name, description, or parameters match the condition:\n\n```\ndef search_features_using_string(search_string):\n response = sagemaker_client.search(\n Resource= \"FeatureMetadata\",\n SearchExpression={\n 'Filters': [\n {\n 'Name': 'FeatureName',\n 'Operator': 'Contains',\n 'Value': search_string\n },\n {\n 'Name': 'Description',\n 'Operator': 'Contains',\n 'Value': search_string\n },\n {\n 'Name': 'AllParameters',\n 'Operator': 'Contains',\n 'Value': search_string\n }\n ],\n \"Operator\": \"Or\"\n },\n )\n\n # Displaying results in a pandas DataFrame\n df=pd.json_normalize(response['Results'], max_level=1)\n df.columns = df.columns.map(lambda col: col.split(\".\")[1])\n df=df.drop('FeatureGroupArn', axis=1)\n\n return df\n\n```\n\nThe following code snippet uses our ```search_features``` function to retrieve all features for which either the feature name, description, or parameters contain the word ```job```:\n\n```\nsearch_results = search_features_using_string('mlops')\nsearch_results\n```\n\nThe following screenshot contains the list of matching feature names as well as their corresponding metadata, including timestamps for each feature’s creation and last modification. You can use this information to improve discovery and visibility into your organization’s features.\n\n![image.png](https://dev-media.amazoncloud.cn/2ecd290f874f47b593a9887a9b64d8c5_image.png)\n\n#### **Conclusion**\n\nSageMaker Feature Store provides a purpose-built feature management solution to help organizations scale ML development across business units and data science teams. Improving feature reuse and feature consistency are primary benefits of a feature store. In this post, we explained how you can use feature-level metadata to improve search and discovery of features. This included creating metadata around a variety of use cases and using it as additional search parameters.\n\nGive it a try, and let us know what you think in comments. If you want to learn more about collaborating and sharing features within Feature Store, refer to [Enable feature reuse across accounts and teams using Amazon SageMaker Feature Store](https://aws.amazon.com/blogs/machine-learning/enable-feature-reuse-across-accounts-and-teams-using-amazon-sagemaker-feature-store/).\n\n#### **About the authors**\n\n![image.png](https://dev-media.amazoncloud.cn/076e8ede33c9414886f1aac6faeaf7fe_image.png)\n\n**Arnaud Lauer** is a Senior Partner Solutions Architect in the Public Sector team at AWS. He enables partners and customers to understand how best to use AWS technologies to translate business needs into solutions. He brings more than 16 years of experience in delivering and architecting digital transformation projects across a range of industries, including the public sector, energy, and consumer goods. Artificial intelligence and machine learning are some of his passions. Arnaud holds 12 AWS certifications, including the ML Specialty Certification.\n\n![image.png](https://dev-media.amazoncloud.cn/0824643231204a5d96991c7ff434823e_image.png)\n\n**Nicolas Bernier** is an Associate Solutions Architect, part of the Canadian Public Sector team at AWS. He is currently conducting a master’s degree with a research area in Deep Learning and holds five AWS certifications, including the ML Specialty Certification. Nicolas is passionate about helping customers deepen their knowledge of AWS by working with them to translate their business challenges into technical solutions.\n\n![image.png](https://dev-media.amazoncloud.cn/10b9d42462ce4c64a5455789d5b57d12_image.png)\n\n**Mark Roy** is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.\n\n![image.png](https://dev-media.amazoncloud.cn/fc26403b008446fdbab0eef745eb1f4e_image.png)\n\n**Khushboo Srivastava** is a Senior Product Manager for Amazon SageMaker. She enjoys building products that simplify machine learning workflows for customers. In her spare time, she enjoys playing violin, practicing yoga, and traveling.","render":"<p><a href=\"https://aws.amazon.com/sagemaker/feature-store/\" target=\"_blank\">Amazon SageMaker Feature Store</a> helps data scientists and machine learning (ML) engineers securely store, discover, and share curated data used in training and prediction workflows. Feature Store is a centralized store for features and associated metadata, allowing features to be easily discovered and reused by data scientist teams working on different projects or ML models.</p>\n<p>With Feature Store, you have always been able to add metadata at the feature group level. Data scientists who want the ability to search and discover existing features for their models now have the ability to search for information at the feature level by adding custom metadata. For example, the information can include a description of the feature, the date it was last modified, its original data source, certain metrics, or the level of sensitivity.</p>\n<p>The following diagram illustrates the architecture relationships between feature groups, features, and associated metadata. Note how data scientists can now specify descriptions and metadata at both the feature group level and the individual feature level.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/b92475646e0841e1b6c429e727c4905a_image.png\" alt=\"image.png\" /></p>\n<p>In this post, we explain how data scientists and ML engineers can use feature-level metadata with the new search and discovery capabilities of Feature Store to promote better feature reuse across their organization. This capability can significantly help data scientists in the feature selection process and, as a result, help you identify features that lead to increased model accuracy.</p>\n<h4><a id=\"Use_case_10\"></a><strong>Use case</strong></h4>\n<p>For the purposes of this post, we use two feature groups, <code>customer</code> and ``loan```.</p>\n<p>The <code>customer</code> feature group has the following features:</p>\n<ul>\n<li>age – Customer’s age (numeric)</li>\n<li>job – Type of job (one-hot encoded, such as <code>admin</code> or <code>services</code>)</li>\n<li>marital – Marital status (one-hot encoded, such as <code>married</code> or <code>single</code>)</li>\n<li>education – Level of education (one-hot encoded, such as <code>basic 4y</code> or <code>high school</code>)</li>\n</ul>\n<p>The <code>loan</code> feature group has the following features:</p>\n<ul>\n<li><strong>default</strong> – Has credit in default? (one-hot encoded: <code>no</code> or <code>yes</code>)</li>\n<li><strong>housing</strong> – Has housing loan? (one-hot encoded: <code>no</code> or <code>yes</code>)</li>\n<li><strong>loan</strong> – Has personal loan? (one-hot encoded: <code>no</code> or <code>yes</code>)</li>\n<li><strong>total_amount</strong> – Total amount of loans (numeric)</li>\n</ul>\n<p>The following figure shows example feature groups and feature metadata.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/1d824b1c6d1b40d09276d9468575c5e9_image.png\" alt=\"image.png\" /></p>\n<p>The purpose of adding a description and assigning metadata to each feature is to increase the speed of discovery by enabling new search parameters along which a data scientist or ML engineer can explore features. These can reflect details about a feature such as its calculation, whether it’s an average over 6 months or 1 year, origin, creator or owner, what the feature means, and more.</p>\n<p>In the following sections, we provide two approaches to search and discover features and configure feature-level metadata: the first using <a href=\"https://aws.amazon.com/sagemaker/studio/\" target=\"_blank\">Amazon SageMaker Studio </a>directly, and the second programmatically.</p>\n<h4><a id=\"Feature_discovery_in_Studio_36\"></a><strong>Feature discovery in Studio</strong></h4>\n<p>You can easily search and query features using Studio. With the new enhanced search and discovery capabilities, you can immediately retrieve results using a simple type-ahead of a few characters.</p>\n<p>The following screenshot demonstrates the following capabilities:</p>\n<ul>\n<li>You can access the <strong>Feature Catalog</strong> tab and observe features across feature groups. The features are presented in a table that includes the feature name, type, description, parameters, date of creation, and associated feature group’s name.</li>\n<li>You can directly use the type-ahead functionality to immediately return search results.</li>\n<li>You have the flexibility to use different types of filter options: <code>All</code>, <code>Feature name</code>, <code>Description</code>, or <code>Parameters</code>. Note that <code>All</code> will return all features where either <code>Feature name</code>, <code>Description</code>, or <code>Parameters</code> match the search criteria.</li>\n<li>You can narrow down the search further by specifying a date range using the <code>Created from</code> and <code>Created to</code> fields and specifying parameters using the <code>Search parameter key</code> and <code>Search parameter value</code> fields.</li>\n</ul>\n<p><img src=\"https://dev-media.amazoncloud.cn/c436b39843ba45d2be993ac34e116ca5_image.png\" alt=\"image.png\" /></p>\n<p>After you have selected a feature, you can choose the feature’s name to bring up its details. When you choose Edit <strong>Metadata</strong>, you can add a description and up to 25 key-value parameters, as shown in the following screenshot. Within this view, you can ultimately create, view, update, and delete the feature’s metadata. The following screenshot illustrates how to edit feature metadata for <code>total_amount</code>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/b2a5540826e4433cbbf15763952e4fb0_%E4%B8%8B%E8%BD%BD.gif\" alt=\"下载.gif\" /></p>\n<p>As previously stated, adding key-value pairs to a feature gives you more dimensions along which to search for their given features. For our example, the feature’s origin has been added to every feature’s metadata. When you choose the search icon and filter along the key-value pair <code>origin</code>: <code>job</code>, you can see all the features that were one-hot-encoded from this base attribute.</p>\n<h4><a id=\"Feature_discovery_using_code_55\"></a><strong>Feature discovery using code</strong></h4>\n<p>You can also access and update feature information through the <a href=\"http://aws.amazon.com/cli\" target=\"_blank\">AWS Command Line Interface</a> (AWS CLI) and SDK (Boto3) rather than directly through the <a href=\"http://aws.amazon.com/console\" target=\"_blank\">AWS Management Console</a>. This allows you to integrate the feature-level search functionality of Feature Store with your own custom data science platforms. In this section, we interact with the Boto3 API endpoints to update and search feature metadata.</p>\n<p>To begin improving feature search and discovery, you can add metadata using the <code>update_feature_metadata</code> API. In addition to the <code>description</code> and <code>created_date</code> fields, you can add up to 25 parameters (key-value pairs) to a given feature.</p>\n<p>The following code is an example of five possible key-value parameters that have been added to the <code>job_admin</code> feature. This feature was created, along with <code>job_services</code> and <code>job_none</code>, by one-hot-encoding <code>job</code>.</p>\n<pre><code class=\"lang-\">sagemaker_client.update_feature_metadata(\n FeatureGroupName=&quot;customer&quot;,\n FeatureName=&quot;job_admin&quot;,\n ParameterAdditions=[\n {&quot;Key&quot;: &quot;author&quot;, &quot;Value&quot;: &quot;arnaud&quot;}, # Feature's author\n {&quot;Key&quot;: &quot;team&quot;, &quot;Value&quot;: &quot;mlops&quot;}, # Team owning the feature\n {&quot;Key&quot;: &quot;origin&quot;, &quot;Value&quot;: &quot;job&quot;}, # Raw input parameter\n {&quot;Key&quot;: &quot;sensitivity&quot;, &quot;Value&quot;: &quot;5&quot;}, # 1-5 scale for data sensitivity\n {&quot;Key&quot;: &quot;env&quot;, &quot;Value&quot;: &quot;testing&quot;} # Environment the feature is used in\n ]\n)\n</code></pre>\n<p>After <code>author</code>, <code>team</code>, <code>origin</code>, <code>sensitivity</code>, and <code>env</code> have been added to the <code>job_admin</code> feature, data scientists or ML engineers can retrieve them by calling the <code>describe_feature_metadata</code> API. You can navigate to the <code>Parameters</code> object in the response for the metadata we previously added to our feature. The <code>describe_feature_metadata</code> API endpoint allows you to get greater insight into a given feature by getting its associated metadata.</p>\n<pre><code class=\"lang-\">response = sagemaker_client.describe_feature_metadata(\n FeatureGroupName=&quot;customer&quot;,\n FeatureName=&quot;job_admin&quot;,\n)\n\n# Navigate to 'Parameters' in response to get metadata\nmetadata = response['Parameters']\n</code></pre>\n<p>You can search for features by using the SageMaker <code>search</code> API using metadata as search parameters. The following code is an example function that takes a <code>search_string</code> parameter as an input and returns all features where the feature’s name, description, or parameters match the condition:</p>\n<pre><code class=\"lang-\">def search_features_using_string(search_string):\n response = sagemaker_client.search(\n Resource= &quot;FeatureMetadata&quot;,\n SearchExpression={\n 'Filters': [\n {\n 'Name': 'FeatureName',\n 'Operator': 'Contains',\n 'Value': search_string\n },\n {\n 'Name': 'Description',\n 'Operator': 'Contains',\n 'Value': search_string\n },\n {\n 'Name': 'AllParameters',\n 'Operator': 'Contains',\n 'Value': search_string\n }\n ],\n &quot;Operator&quot;: &quot;Or&quot;\n },\n )\n\n # Displaying results in a pandas DataFrame\n df=pd.json_normalize(response['Results'], max_level=1)\n df.columns = df.columns.map(lambda col: col.split(&quot;.&quot;)[1])\n df=df.drop('FeatureGroupArn', axis=1)\n\n return df\n\n</code></pre>\n<p>The following code snippet uses our <code>search_features</code> function to retrieve all features for which either the feature name, description, or parameters contain the word <code>job</code>:</p>\n<pre><code class=\"lang-\">search_results = search_features_using_string('mlops')\nsearch_results\n</code></pre>\n<p>The following screenshot contains the list of matching feature names as well as their corresponding metadata, including timestamps for each feature’s creation and last modification. You can use this information to improve discovery and visibility into your organization’s features.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/2ecd290f874f47b593a9887a9b64d8c5_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Conclusion_137\"></a><strong>Conclusion</strong></h4>\n<p>SageMaker Feature Store provides a purpose-built feature management solution to help organizations scale ML development across business units and data science teams. Improving feature reuse and feature consistency are primary benefits of a feature store. In this post, we explained how you can use feature-level metadata to improve search and discovery of features. This included creating metadata around a variety of use cases and using it as additional search parameters.</p>\n<p>Give it a try, and let us know what you think in comments. If you want to learn more about collaborating and sharing features within Feature Store, refer to <a href=\"https://aws.amazon.com/blogs/machine-learning/enable-feature-reuse-across-accounts-and-teams-using-amazon-sagemaker-feature-store/\" target=\"_blank\">Enable feature reuse across accounts and teams using Amazon SageMaker Feature Store</a>.</p>\n<h4><a id=\"About_the_authors_143\"></a><strong>About the authors</strong></h4>\n<p><img src=\"https://dev-media.amazoncloud.cn/076e8ede33c9414886f1aac6faeaf7fe_image.png\" alt=\"image.png\" /></p>\n<p><strong>Arnaud Lauer</strong> is a Senior Partner Solutions Architect in the Public Sector team at AWS. He enables partners and customers to understand how best to use AWS technologies to translate business needs into solutions. He brings more than 16 years of experience in delivering and architecting digital transformation projects across a range of industries, including the public sector, energy, and consumer goods. Artificial intelligence and machine learning are some of his passions. Arnaud holds 12 AWS certifications, including the ML Specialty Certification.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/0824643231204a5d96991c7ff434823e_image.png\" alt=\"image.png\" /></p>\n<p><strong>Nicolas Bernier</strong> is an Associate Solutions Architect, part of the Canadian Public Sector team at AWS. He is currently conducting a master’s degree with a research area in Deep Learning and holds five AWS certifications, including the ML Specialty Certification. Nicolas is passionate about helping customers deepen their knowledge of AWS by working with them to translate their business challenges into technical solutions.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/10b9d42462ce4c64a5455789d5b57d12_image.png\" alt=\"image.png\" /></p>\n<p><strong>Mark Roy</strong> is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/fc26403b008446fdbab0eef745eb1f4e_image.png\" alt=\"image.png\" /></p>\n<p><strong>Khushboo Srivastava</strong> is a Senior Product Manager for Amazon SageMaker. She enjoys building products that simplify machine learning workflows for customers. In her spare time, she enjoys playing violin, practicing yoga, and traveling.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭