Use an AD FS user and Tableau to securely query data in Amazon Lake Formation

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"Security-conscious customers often adopt a [Zero Trust security](https://aws.amazon.com/blogs/security/zero-trust-architectures-an-aws-perspective/) architecture. Zero Trust is a security model centered on the idea that access to data shouldn’t be solely based on network location, but rather require users and systems to [prove their identities ](https://aws.amazon.com/security/zero-trust/)and trustworthiness and enforce fine-grained identity-based authorization rules before granting access to applications, data, and other systems.\n\nSome customers rely on third-party [identity providers](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers.html) (IdPs) like [Active Directory Federated Services (AD FS)](https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/ad-fs-overview) as a system to manage credentials and prove identities and trustworthiness. Users can use their AD FS credentials to authenticate to various related yet independent systems, including the [AWS Management Console](https://aws.amazon.com/console/) (for more information, see [Enabling SAML 2.0 federated users to access the AWS Management Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html)).\n\nIn the context of analytics, some customers extend Zero Trust to data stored in data lakes, which includes the various business intelligence (BI) tools used to access that data. A common data lake pattern is to store data in [Amazon Simple Storage Service (Amazon S3)](http://aws.amazon.com/s3) and query the data using [Amazon Athena](http://aws.amazon.com/athena).\n\n[AWS Lake Formation](https://aws.amazon.com/lake-formation/) allows you to define and enforce access policies at the database, table, and column level when using Athena queries to read data stored in Amazon S3. [Lake Formation supports Active Directory](https://aws.amazon.com/about-aws/whats-new/2020/10/aws-lake-formation-supports-active-directory-saml-providers-amazon-athena/) and [Security Assertion Markup Language](https://aws.amazon.com/identity/saml/) (SAML) identity providers such as [OKTA](https://www.okta.com/) and [Auth0](https://auth0.com/). Furthermore, Lake Formation securely integrates with the AWS BI service [Amazon QuickSight](https://aws.amazon.com/quicksight/). QuickSight allows you to effortlessly create and publish interactive BI dashboards, and supports authentication via [Active Directory](https://docs.aws.amazon.com/quicksight/latest/user/aws-directory-service.html). However, if you use alternative BI tools like [Tableau](https://www.tableau.com/), you may want to use your Active Directory credentials to access data stored in Lake Formation.\n\nIn this post, we show you how you can use AD FS credentials with Tableau to implement a Zero Trust architecture and securely query data in Amazon S3 and Lake Formation.\n\n#### **Solution overview**\nIn this architecture, user credentials are managed by Active Directory, and not [Amazon Identity and Access Management](https://aws.amazon.com/iam/) (IAM). Although Tableau provides a [connector](https://help.tableau.com/current/pro/desktop/en-us/examples_amazonathena.htm) to connect Tableau to Athena, the connector requires an AWS access key ID and an AWS secret access key normally used for [programmatic access](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html). Creating an IAM user with programmatic access for use by Tableau is a potential solution, however some customers have made an architectural decision that access to AWS accounts is done via a federated process using Active Directory, and not an IAM user.\n\nIn this post, we show you how you can use the [Athena ODBC](https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html) driver in conjunction with AD FS credentials to query sample data in a newly created data lake. We simulate the environment by enabling federation to AWS using AD FS 3.0 and SAML 2.0. Then we guide you through setting up a data lake using Lake Formation. Finally, we show you how you can configure an ODBC driver for Tableau to securely query your data in the lake data using your AD FS credentials.\n\n#### **Prerequisites**\nThe following prerequisites are required to complete this walkthrough:\n\n- An [understanding of IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/intro-structure.html) and concepts\n- A [basic understanding](https://aws.amazon.com/blogs/big-data/getting-started-with-aws-lake-formation/) of Lake Formation and Athena\n- A copy of [Tableau](https://www.tableau.com/trial/tableau-software) with a 14-day trail or fully licensed software\n- An [understanding of the concepts of Active Directory](https://aws.amazon.com/blogs/security/introducing-aws-directory-service-for-microsoft-active-directory-standard-edition/), and [how to join a computer to an Active Directory domain](https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/deployment/join-a-computer-to-a-domain)\n- An understanding of [configuring ODBC components](https://support.microsoft.com/en-us/office/administer-odbc-data-sources-b19f856b-5b9b-48c9-8b93-07484bfab5a7) on a Windows machine\n\n#### **Create your environment**\nTo simulate the production environment, we created a standard VPC in [Amazon Virtual Private Cloud](http://aws.amazon.com/vpc) (Amazon VPC) with one private subnet and one public subnet. You can do the same using the [VPC wizard](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html). Our [Amazon Elastic Compute Cloud](http://aws.amazon.com/ec2) (Amazon EC2) instance running the Tableau client is located in a private subnet and accessible via an [EC2 bastion host](https://aws.amazon.com/blogs/security/controlling-network-access-to-ec2-instances-using-a-bastion-server/). For simplicity, connecting out to Amazon S3, [AWS Glue](https://aws.amazon.com/glue), and Athena is done via the NAT gateway and internet gateway set up by the VPC wizard. Optionally, you can replace the [NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html) with AWS PrivateLink endpoints (AWS Security Token Service (AWS STS), [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html), Athena, and [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoint.html) endpoints are required) to make sure traffic remains within the AWS network.\n\nThe following diagram illustrates our environment architecture.\n\n![image.png](https://dev-media.amazoncloud.cn/f53131d468c94b7ca320db7f779b2a77_image.png)\n\nAfter you create your VPC with its private and public subnets, you can continue to build out the other requirements, such as Active Directory and Lake Formation. Let’s begin with Active Directory.\n\n#### **Enable federation to AWS using AD FS 3.0 and SAML 2.0**\nAD FS 3.0, a component of Windows Server, supports SAML 2.0 and is integrated with IAM. This integration allows Active Directory users to federate to AWS using corporate directory credentials, such as a user name and password from Active Directory. Before you can complete this section, AD FS must be configured and running.\n\nTo set up AD FS, follow the instructions in [Setting up trust between AD FS and AWS and using Active Directory credentials to connect to Amazon Athena with ODBC driver](https://aws.amazon.com/blogs/big-data/setting-up-trust-between-adfs-and-aws-and-using-active-directory-credentials-to-connect-to-amazon-athena-with-odbc-driver/). The first section of the post explains in detail how to set up AD FS and establish the trust between AD FS and Active Directory. The post ends with setting up an ODBC driver for Athena, which you can skip. The post creates a group name called ```ArunADFSTest```. This group relates to a role in your AWS account, which you use later.\n\nWhen you have successfully verified that you can log in using your IdP, you’re ready to configure your Windows environment ODBC driver to connect to Athena.\n\n#### **Set up a data lake using Lake Formation**\nLake Formation is a fully managed service that makes it easy for you to build, secure, and manage data lakes. Lake Formation provides its own permissions model that augments the IAM permissions model. This centrally defined permissions model enables fine-grained access to data stored in data lakes through a simple grant/revoke mechanism. We use this permissions model to grant access to the AD FS role we created earlier.\n\n1. On the Lake Formation console, you’re prompted with a welcome box the first time you access Lake formation.The box asks you to select the initial administrative user and roles.\n2. Choose **Add myself** and choose **Get Started**.\nWe use the sample database provided by Lake Formation, but you’re welcome to use your own dataset. For instructions on loading your own dataset, see [Getting Started with Lake Formation](https://aws.amazon.com/blogs/big-data/getting-started-with-aws-lake-formation/).With Lake Formation configured, we must grant read access to the AD FS role (```ArunADFSTest```) we created in the previous step.\n3. In the navigation pane, choose **Databases**.\n4. Select the database ```sampledb```.\n5. On the **Actions** menu, choose **Grant**.\n\n![image.png](https://dev-media.amazoncloud.cn/a2e9657767c24a598bc7be9f065f50dc_image.png)\n\nWe grant the ```SamlOdbcAccess``` role access to ```sampledb```.\n\n6. For **Principals**, select **IAM users and roles**.\n7. For **IAM users and roles**, choose the role ```ArunADFSTest```.\n8. Select **Named data catalog resources**.\n9. For **Databases**, choose the database ```sampledb```.\n10. For **Tables**¸ choose **All tables**.\n\n![image.png](https://dev-media.amazoncloud.cn/6b602fa5a7674726b8099b6b634188a8_image.png)\n\n11. Set the table permissions to **Select** and **Describe**.\n12. For **Data** permissions, select **All data access**.\n13. Choose **Grant**.\nOur AD FS user assumes the role ```ArunADFSTest```, which has been granted access to ```sampledb``` by Lake Formation. However, the ```ArunADFSTest``` role requires access to Lake Formation, Athena, AWS Glue, and Amazon S3. Following the [practice of least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege), AWS defines policies for specific Lake Formation personas. Our user fits the [Data Analyst persona](https://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html#persona-user), which requires enough permissions to run queries.\n14. Add the ```AmazonAthenaFullAccess``` [managed policy](https://docs.aws.amazon.com/athena/latest/ug/managed-policies.html#amazonathenafullaccess-managed-policy) (for instructions, see [Adding and removing IAM identity permissions)](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) and the following inline policy to the ```ArunADFSTest``` role:\n\n```\n{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Action\": [\n \"lakeformation:GetDataAccess\",\n \"glue:GetTable\",\n \"glue:GetTables\",\n \"glue:SearchTables\",\n \"glue:GetDatabase\",\n \"glue:GetDatabases\",\n \"glue:GetPartitions\",\n \"lakeformation:GetResourceLFTags\",\n \"lakeformation:ListLFTags\",\n \"lakeformation:GetLFTag\",\n \"lakeformation:SearchTablesByLFTags\",\n \"lakeformation:SearchDatabasesByLFTags\"\n ],\n \"Resource\": \"*\"\n }\n ]\n}\n```\nEach time Athena runs a query, it stores the results in an S3 bucket, which is configured as the query result location in Athena.\n15. [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html), and in this new bucket create a new folder called ```athena_results```.\n16. [Update the settings on the Athena console](https://docs.aws.amazon.com/athena/latest/ug/getting-started.html) to use your newly created folder.\nTableau uses Athena to run the query and read the results from Amazon S3, which means that the ```ArunADFSTest``` role requires access to your newly created S3 folder.\n17. Attach the following inline policy to the ```ArunADFSTest``` role:\n```\n{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Action\": [\n \"s3:GetObject\",\n \"s3:PutObject\",\n \"s3:PutObjectAcl\"\n ],\n \"Resource\": \"arn:aws:s3:::[BUCKET_NAME]/athena_results/*\"\n }\n ]\n}\n```\nOur AD FS user can now assume a role that has enough privileges to query the sample database. The next step is to configure the ODBC driver on the client.\n\n#### **Configure an Athena ODBC driver**\nAthena is a managed serverless and interactive query service that allows you to analyze your data in Amazon S3 using standard Structured Query Language [(SQL)](https://en.wikipedia.org/wiki/SQL). You can use Athena to directly query data that is located in Amazon S3 or data that is [registered with Lake Formation](https://docs.aws.amazon.com/athena/latest/ug/security-athena-lake-formation.html). Athena provides you with [ODBC and JDBC](https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html) drivers to effortlessly integrate with your data analytics tools (such as Microsoft Power BI, Tableau, or SQL Workbench) to seamlessly gain insights about your data in minutes.\n\nTo connect to our Lake Formation environment, we first need to install and configure the Athena ODBC driver on our Windows environment.\n\n1. Download the Athena [ODBC](https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html) driver relevant to your Windows environment.\n2. Install the driver by choosing the driver file you downloaded (in our case, ```Simba+Athena+1.1+64-bit.msi```).\n\n![image.png](https://dev-media.amazoncloud.cn/a430fea834de4ab287be668a0614aaa1_image.png)\n\n3.Choose **Next** on the welcome page.\n\n![image.png](https://dev-media.amazoncloud.cn/0e2dd25379c34699843e7f6a6555482e_image.png)\n\n4.Read the End-User License Agreement, and if you agree to it, select **I Accept the terms in the License Agreement** and choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/7b94ffe2700b47218c5986bbe45ee8a0_image.png)\n\n5.Leave the default installation location for the ODBC driver and choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/d68aad941ab646d28fb21f61097a035d_image.png)\n\n6.Choose **Install** to begin the installation.\n\n![image.png](https://dev-media.amazoncloud.cn/7005d3c6ca7f4b92a35f151b90fc3d01_image.png)\n\n7.If the **User Access Control** pop-up appears, choose **Yes** to allow the driver installation to continue.\n\n![image.png](https://dev-media.amazoncloud.cn/94968b5d1b9a4a379040b55aa87fe231_image.png)\n\n8.When the driver installation is complete, choose **Finish** to close the installer.\n\n![image.png](https://dev-media.amazoncloud.cn/93f655d96d884f88a055790132d6ad94_image.png)\n\n9.Open the Windows ODBC configuration application by selecting the **Start** bar and searching for ODBC.\n10.Open the version corresponding to the Athena ODBC version you installed, in our case 64 bit.\n\n![image.png](https://dev-media.amazoncloud.cn/9a04191bb35c4a008eaf33698699f192_image.png)\n\n11.On the **User DSN** tab, choose **Add**.\n\n![image.png](https://dev-media.amazoncloud.cn/eb6c74635ee8499780511d8157d6e832_image.png)\n\n12.Choose the Simba Athena ODBC driver and choose **Finish**.\n\n![image.png](https://dev-media.amazoncloud.cn/ffe435fed4284bd0831298c00bc2d86f_image.png)\n\n#### **Configure the ODBC driver to AD FS authentication**\nWe now need to configure the driver.\n\n1. Choose the driver on the **Driver configuration** page.\n2. For **Data Source Name**, enter ```sampledb```.\n3. For **Description**, enter ```Lake Formation Sample Database```.\n4. For **AWS Region**, enter ```eu-west-1``` or the Region you used when configuring Lake Formation.\n5. For **Metadata Retrieval Method**, choose Auto.\n6. For **S3 Output Location**, enter ```s3://[BUCKET_NAME]/athena_results/```.\n7. For **Encryption Options**, choose **NOT_SET**.\n8. Clear the rest of the options.\n9. Choose **Authentication Options**.\n\n![image.png](https://dev-media.amazoncloud.cn/90632903f96249b2bbfc3d721f0e3265_image.png)\n\n10.For **Authentication Type**, choose **ADFS**.\n11.For **User**, enter ```[DOMAIN]\\[USERNAME]```.\n12.For **Password**, enter your domain user password.\n13.For **Preferred Role**, enter ```aws:iam::[ACCOUNT NUMBER]:role/ArunADFSTest```.\nThe preferred role is the same role configured in the previous section (ArunADFSTest).\n14,。For **IdP Host**, enter the AD federation URL you configured during AD FS setup.\n15.For **IdP Port**, enter ```443```.\n16.Select **SSL** **Insecure**.\n17.Choose **OK**.\n\n![image.png](https://dev-media.amazoncloud.cn/6ef113a41a884ae89f17d5d16146004b_image.png)\n\n18.Choose **Test** on the initial configuration page to test the connection.\n19.When you see a success confirmation, choose **OK**.\n\n![image.png](https://dev-media.amazoncloud.cn/ab87cda9e89c4a3eaad87943d343204d_image.png)\n\nWe can now connect to our Lake Formation sample database from our desktop environment using the Athena ODBC driver. The next step is to use Tableau to query our data using the ODBC connection\n\n#### **Connect to your data using Tableau**\nTo connect to your data, complete the following steps:\n1. Open your Tableau Desktop edition.\n2. Under **To a Server**, choose **More**.\n3. On the list of available Tableau installed connectors, choose **Other Databases (ODBC)**.\n4. Choose the ODBC database you created earlier.\n5. Choose **Connect**.\n6. Choose **Sign In**.\n\n![image.png](https://dev-media.amazoncloud.cn/008162e63ba04463a9a5156825852700_image.png)\n\nWhen the Tableau workbook opens, select the database, schema, and table that you want to query.\n7.For **Database**, choose the database as listed in the ODBC setup (for this post, ```AwsDataCatalog```).\n8.For **Schema**, choose your schema (```sampledb```).\n9.For **Table**, search for and choose your table (```elb_logs```).\n10.Drag the table to the work area to start your query and further report development.\n\n![image.png](https://dev-media.amazoncloud.cn/5f90378d15664427a7f1429c4daec5e3_image.png)\n\n#### **Clean up**\nAWS Lake Formation provides database-, table-, column-, and tag-based access controls, and cross-account sharing at no charge. Lake Formation charges a fee for transaction requests and for metadata storage. In addition to providing a consistent view of data and enforcing row-level and cell-level security, the Lake Formation Storage API scans data in Amazon S3 and applies row and cell filters before returning results to applications. There is a fee for this filtering. To make sure that you’re not charged for any of the services that you no longer need, [stop any EC2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html) instances that you created. [Remove any objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeletingObjects.html) in Amazon S3 you no longer require, because you pay for objects stored in S3 buckets.\n\nLastly, [delete any Active Directory](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/simple_ad_delete.html) instances you may have created.\n\n#### **Conclusion**\nLake Formation makes it simple to set up a secure data lake and then use the data lake with your choice of analytics and machine learning services, including Tableau. In this post, we showed you how you can connect to your data lake using AD FS credentials in a simple and secure way by using the Athena ODBC driver. Your AD FS user is configured within the ODBC driver, which then assumes a role in AWS. This role is granted access to only the data you require via Lake Formation.\n\nTo learn more about Lake Formation, see the [Lake Formation Developer Guide](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html) or follow the [Lake Formation workshop](https://lakeformation.workshop.aws/).\n\n##### **About the Authors**\n\n![image.png](https://dev-media.amazoncloud.cn/b00b3e055e544465a6d79eb4233fef9f_image.png)\n\n**Jason Nicholls** is an Enterprise Solutions Architect at AWS. He’s passionate about building scalable web and mobile applications on AWS. He started coding on a Commodore VIC 20, which lead to a career in software development. Jason holds a MSc in Computer Science with specialization in coevolved genetic programming. He is based in Johannesburg, South Africa.\n\n![image.png](https://dev-media.amazoncloud.cn/33495fb77e8643c086626cc15cd6bbac_image.png)\n\n**Francois van Rensburg** is a Partner Management Solutions Architect at AWS. He has spent the last decade helping enterprise organizations successfully migrate to the cloud. He is passionate about networking and all things cloud. He started as a Cobol programmer and has built everything from software to data centers. He is based in Denver, Colorado.","render":"<p>Security-conscious customers often adopt a <a href=\"https://aws.amazon.com/blogs/security/zero-trust-architectures-an-aws-perspective/\" target=\"_blank\">Zero Trust security</a> architecture. Zero Trust is a security model centered on the idea that access to data shouldn’t be solely based on network location, but rather require users and systems to <a href=\"https://aws.amazon.com/security/zero-trust/\" target=\"_blank\">prove their identities </a>and trustworthiness and enforce fine-grained identity-based authorization rules before granting access to applications, data, and other systems.</p>\n<p>Some customers rely on third-party <a href=\"https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers.html\" target=\"_blank\">identity providers</a> (IdPs) like <a href=\"https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/ad-fs-overview\" target=\"_blank\">Active Directory Federated Services (AD FS)</a> as a system to manage credentials and prove identities and trustworthiness. Users can use their AD FS credentials to authenticate to various related yet independent systems, including the <a href=\"https://aws.amazon.com/console/\" target=\"_blank\">AWS Management Console</a> (for more information, see <a href=\"https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_enable-console-saml.html\" target=\"_blank\">Enabling SAML 2.0 federated users to access the AWS Management Console</a>).</p>\n<p>In the context of analytics, some customers extend Zero Trust to data stored in data lakes, which includes the various business intelligence (BI) tools used to access that data. A common data lake pattern is to store data in <a href=\"http://aws.amazon.com/s3\" target=\"_blank\">Amazon Simple Storage Service (Amazon S3)</a> and query the data using <a href=\"http://aws.amazon.com/athena\" target=\"_blank\">Amazon Athena</a>.</p>\n<p><a href=\"https://aws.amazon.com/lake-formation/\" target=\"_blank\">AWS Lake Formation</a> allows you to define and enforce access policies at the database, table, and column level when using Athena queries to read data stored in Amazon S3. <a href=\"https://aws.amazon.com/about-aws/whats-new/2020/10/aws-lake-formation-supports-active-directory-saml-providers-amazon-athena/\" target=\"_blank\">Lake Formation supports Active Directory</a> and <a href=\"https://aws.amazon.com/identity/saml/\" target=\"_blank\">Security Assertion Markup Language</a> (SAML) identity providers such as <a href=\"https://www.okta.com/\" target=\"_blank\">OKTA</a> and <a href=\"https://auth0.com/\" target=\"_blank\">Auth0</a>. Furthermore, Lake Formation securely integrates with the AWS BI service <a href=\"https://aws.amazon.com/quicksight/\" target=\"_blank\">Amazon QuickSight</a>. QuickSight allows you to effortlessly create and publish interactive BI dashboards, and supports authentication via <a href=\"https://docs.aws.amazon.com/quicksight/latest/user/aws-directory-service.html\" target=\"_blank\">Active Directory</a>. However, if you use alternative BI tools like <a href=\"https://www.tableau.com/\" target=\"_blank\">Tableau</a>, you may want to use your Active Directory credentials to access data stored in Lake Formation.</p>\n<p>In this post, we show you how you can use AD FS credentials with Tableau to implement a Zero Trust architecture and securely query data in Amazon S3 and Lake Formation.</p>\n<h4><a id=\"Solution_overview_10\"></a><strong>Solution overview</strong></h4>\n<p>In this architecture, user credentials are managed by Active Directory, and not <a href=\"https://aws.amazon.com/iam/\" target=\"_blank\">Amazon Identity and Access Management</a> (IAM). Although Tableau provides a <a href=\"https://help.tableau.com/current/pro/desktop/en-us/examples_amazonathena.htm\" target=\"_blank\">connector</a> to connect Tableau to Athena, the connector requires an AWS access key ID and an AWS secret access key normally used for <a href=\"https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html\" target=\"_blank\">programmatic access</a>. Creating an IAM user with programmatic access for use by Tableau is a potential solution, however some customers have made an architectural decision that access to AWS accounts is done via a federated process using Active Directory, and not an IAM user.</p>\n<p>In this post, we show you how you can use the <a href=\"https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html\" target=\"_blank\">Athena ODBC</a> driver in conjunction with AD FS credentials to query sample data in a newly created data lake. We simulate the environment by enabling federation to AWS using AD FS 3.0 and SAML 2.0. Then we guide you through setting up a data lake using Lake Formation. Finally, we show you how you can configure an ODBC driver for Tableau to securely query your data in the lake data using your AD FS credentials.</p>\n<h4><a id=\"Prerequisites_15\"></a><strong>Prerequisites</strong></h4>\n<p>The following prerequisites are required to complete this walkthrough:</p>\n<ul>\n<li>An <a href=\"https://docs.aws.amazon.com/IAM/latest/UserGuide/intro-structure.html\" target=\"_blank\">understanding of IAM roles</a> and concepts</li>\n<li>A <a href=\"https://aws.amazon.com/blogs/big-data/getting-started-with-aws-lake-formation/\" target=\"_blank\">basic understanding</a> of Lake Formation and Athena</li>\n<li>A copy of <a href=\"https://www.tableau.com/trial/tableau-software\" target=\"_blank\">Tableau</a> with a 14-day trail or fully licensed software</li>\n<li>An <a href=\"https://aws.amazon.com/blogs/security/introducing-aws-directory-service-for-microsoft-active-directory-standard-edition/\" target=\"_blank\">understanding of the concepts of Active Directory</a>, and <a href=\"https://docs.microsoft.com/en-us/windows-server/identity/ad-fs/deployment/join-a-computer-to-a-domain\" target=\"_blank\">how to join a computer to an Active Directory domain</a></li>\n<li>An understanding of <a href=\"https://support.microsoft.com/en-us/office/administer-odbc-data-sources-b19f856b-5b9b-48c9-8b93-07484bfab5a7\" target=\"_blank\">configuring ODBC components</a> on a Windows machine</li>\n</ul>\n<h4><a id=\"Create_your_environment_24\"></a><strong>Create your environment</strong></h4>\n<p>To simulate the production environment, we created a standard VPC in <a href=\"http://aws.amazon.com/vpc\" target=\"_blank\">Amazon Virtual Private Cloud</a> (Amazon VPC) with one private subnet and one public subnet. You can do the same using the <a href=\"https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html\" target=\"_blank\">VPC wizard</a>. Our <a href=\"http://aws.amazon.com/ec2\" target=\"_blank\">Amazon Elastic Compute Cloud</a> (Amazon EC2) instance running the Tableau client is located in a private subnet and accessible via an <a href=\"https://aws.amazon.com/blogs/security/controlling-network-access-to-ec2-instances-using-a-bastion-server/\" target=\"_blank\">EC2 bastion host</a>. For simplicity, connecting out to Amazon S3, <a href=\"https://aws.amazon.com/glue\" target=\"_blank\">AWS Glue</a>, and Athena is done via the NAT gateway and internet gateway set up by the VPC wizard. Optionally, you can replace the <a href=\"https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html\" target=\"_blank\">NAT gateway</a> with AWS PrivateLink endpoints (AWS Security Token Service (AWS STS), <a href=\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html\" target=\"_blank\">Amazon S3</a>, Athena, and <a href=\"https://docs.aws.amazon.com/glue/latest/dg/vpc-endpoint.html\" target=\"_blank\">AWS Glue</a> endpoints are required) to make sure traffic remains within the AWS network.</p>\n<p>The following diagram illustrates our environment architecture.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/f53131d468c94b7ca320db7f779b2a77_image.png\" alt=\"image.png\" /></p>\n<p>After you create your VPC with its private and public subnets, you can continue to build out the other requirements, such as Active Directory and Lake Formation. Let’s begin with Active Directory.</p>\n<h4><a id=\"Enable_federation_to_AWS_using_AD_FS_30_and_SAML_20_33\"></a><strong>Enable federation to AWS using AD FS 3.0 and SAML 2.0</strong></h4>\n<p>AD FS 3.0, a component of Windows Server, supports SAML 2.0 and is integrated with IAM. This integration allows Active Directory users to federate to AWS using corporate directory credentials, such as a user name and password from Active Directory. Before you can complete this section, AD FS must be configured and running.</p>\n<p>To set up AD FS, follow the instructions in <a href=\"https://aws.amazon.com/blogs/big-data/setting-up-trust-between-adfs-and-aws-and-using-active-directory-credentials-to-connect-to-amazon-athena-with-odbc-driver/\" target=\"_blank\">Setting up trust between AD FS and AWS and using Active Directory credentials to connect to Amazon Athena with ODBC driver</a>. The first section of the post explains in detail how to set up AD FS and establish the trust between AD FS and Active Directory. The post ends with setting up an ODBC driver for Athena, which you can skip. The post creates a group name called <code>ArunADFSTest</code>. This group relates to a role in your AWS account, which you use later.</p>\n<p>When you have successfully verified that you can log in using your IdP, you’re ready to configure your Windows environment ODBC driver to connect to Athena.</p>\n<h4><a id=\"Set_up_a_data_lake_using_Lake_Formation_40\"></a><strong>Set up a data lake using Lake Formation</strong></h4>\n<p>Lake Formation is a fully managed service that makes it easy for you to build, secure, and manage data lakes. Lake Formation provides its own permissions model that augments the IAM permissions model. This centrally defined permissions model enables fine-grained access to data stored in data lakes through a simple grant/revoke mechanism. We use this permissions model to grant access to the AD FS role we created earlier.</p>\n<ol>\n<li>On the Lake Formation console, you’re prompted with a welcome box the first time you access Lake formation.The box asks you to select the initial administrative user and roles.</li>\n<li>Choose <strong>Add myself</strong> and choose <strong>Get Started</strong>.<br />\nWe use the sample database provided by Lake Formation, but you’re welcome to use your own dataset. For instructions on loading your own dataset, see <a href=\"https://aws.amazon.com/blogs/big-data/getting-started-with-aws-lake-formation/\" target=\"_blank\">Getting Started with Lake Formation</a>.With Lake Formation configured, we must grant read access to the AD FS role (<code>ArunADFSTest</code>) we created in the previous step.</li>\n<li>In the navigation pane, choose <strong>Databases</strong>.</li>\n<li>Select the database <code>sampledb</code>.</li>\n<li>On the <strong>Actions</strong> menu, choose <strong>Grant</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/a2e9657767c24a598bc7be9f065f50dc_image.png\" alt=\"image.png\" /></p>\n<p>We grant the <code>SamlOdbcAccess</code> role access to <code>sampledb</code>.</p>\n<ol start=\"6\">\n<li>For <strong>Principals</strong>, select <strong>IAM users and roles</strong>.</li>\n<li>For <strong>IAM users and roles</strong>, choose the role <code>ArunADFSTest</code>.</li>\n<li>Select <strong>Named data catalog resources</strong>.</li>\n<li>For <strong>Databases</strong>, choose the database <code>sampledb</code>.</li>\n<li>For <strong>Tables</strong>¸ choose <strong>All tables</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/6b602fa5a7674726b8099b6b634188a8_image.png\" alt=\"image.png\" /></p>\n<ol start=\"11\">\n<li>Set the table permissions to <strong>Select</strong> and <strong>Describe</strong>.</li>\n<li>For <strong>Data</strong> permissions, select <strong>All data access</strong>.</li>\n<li>Choose <strong>Grant</strong>.<br />\nOur AD FS user assumes the role <code>ArunADFSTest</code>, which has been granted access to <code>sampledb</code> by Lake Formation. However, the <code>ArunADFSTest</code> role requires access to Lake Formation, Athena, AWS Glue, and Amazon S3. Following the <a href=\"https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege\" target=\"_blank\">practice of least privilege</a>, AWS defines policies for specific Lake Formation personas. Our user fits the <a href=\"https://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html#persona-user\" target=\"_blank\">Data Analyst persona</a>, which requires enough permissions to run queries.</li>\n<li>Add the <code>AmazonAthenaFullAccess</code> <a href=\"https://docs.aws.amazon.com/athena/latest/ug/managed-policies.html#amazonathenafullaccess-managed-policy\" target=\"_blank\">managed policy</a> (for instructions, see <a href=\"https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html\" target=\"_blank\">Adding and removing IAM identity permissions)</a> and the following inline policy to the <code>ArunADFSTest</code> role:</li>\n</ol>\n<pre><code class=\"lang-\">{\n &quot;Version&quot;: &quot;2012-10-17&quot;,\n &quot;Statement&quot;: [\n {\n &quot;Effect&quot;: &quot;Allow&quot;,\n &quot;Action&quot;: [\n &quot;lakeformation:GetDataAccess&quot;,\n &quot;glue:GetTable&quot;,\n &quot;glue:GetTables&quot;,\n &quot;glue:SearchTables&quot;,\n &quot;glue:GetDatabase&quot;,\n &quot;glue:GetDatabases&quot;,\n &quot;glue:GetPartitions&quot;,\n &quot;lakeformation:GetResourceLFTags&quot;,\n &quot;lakeformation:ListLFTags&quot;,\n &quot;lakeformation:GetLFTag&quot;,\n &quot;lakeformation:SearchTablesByLFTags&quot;,\n &quot;lakeformation:SearchDatabasesByLFTags&quot;\n ],\n &quot;Resource&quot;: &quot;*&quot;\n }\n ]\n}\n</code></pre>\n<p>Each time Athena runs a query, it stores the results in an S3 bucket, which is configured as the query result location in Athena.<br />\n15. <a href=\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html\" target=\"_blank\">Create an S3 bucket</a>, and in this new bucket create a new folder called <code>athena_results</code>.<br />\n16. <a href=\"https://docs.aws.amazon.com/athena/latest/ug/getting-started.html\" target=\"_blank\">Update the settings on the Athena console</a> to use your newly created folder.<br />\nTableau uses Athena to run the query and read the results from Amazon S3, which means that the <code>ArunADFSTest</code> role requires access to your newly created S3 folder.<br />\n17. Attach the following inline policy to the <code>ArunADFSTest</code> role:</p>\n<pre><code class=\"lang-\">{\n &quot;Version&quot;: &quot;2012-10-17&quot;,\n &quot;Statement&quot;: [\n {\n &quot;Effect&quot;: &quot;Allow&quot;,\n &quot;Action&quot;: [\n &quot;s3:GetObject&quot;,\n &quot;s3:PutObject&quot;,\n &quot;s3:PutObjectAcl&quot;\n ],\n &quot;Resource&quot;: &quot;arn:aws:s3:::[BUCKET_NAME]/athena_results/*&quot;\n }\n ]\n}\n</code></pre>\n<p>Our AD FS user can now assume a role that has enough privileges to query the sample database. The next step is to configure the ODBC driver on the client.</p>\n<h4><a id=\"Configure_an_Athena_ODBC_driver_116\"></a><strong>Configure an Athena ODBC driver</strong></h4>\n<p>Athena is a managed serverless and interactive query service that allows you to analyze your data in Amazon S3 using standard Structured Query Language <a href=\"https://en.wikipedia.org/wiki/SQL\" target=\"_blank\">(SQL)</a>. You can use Athena to directly query data that is located in Amazon S3 or data that is <a href=\"https://docs.aws.amazon.com/athena/latest/ug/security-athena-lake-formation.html\" target=\"_blank\">registered with Lake Formation</a>. Athena provides you with <a href=\"https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html\" target=\"_blank\">ODBC and JDBC</a> drivers to effortlessly integrate with your data analytics tools (such as Microsoft Power BI, Tableau, or SQL Workbench) to seamlessly gain insights about your data in minutes.</p>\n<p>To connect to our Lake Formation environment, we first need to install and configure the Athena ODBC driver on our Windows environment.</p>\n<ol>\n<li>Download the Athena <a href=\"https://docs.aws.amazon.com/athena/latest/ug/connect-with-odbc.html\" target=\"_blank\">ODBC</a> driver relevant to your Windows environment.</li>\n<li>Install the driver by choosing the driver file you downloaded (in our case, <code>Simba+Athena+1.1+64-bit.msi</code>).</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/a430fea834de4ab287be668a0614aaa1_image.png\" alt=\"image.png\" /></p>\n<p>3.Choose <strong>Next</strong> on the welcome page.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/0e2dd25379c34699843e7f6a6555482e_image.png\" alt=\"image.png\" /></p>\n<p>4.Read the End-User License Agreement, and if you agree to it, select <strong>I Accept the terms in the License Agreement</strong> and choose <strong>Next</strong>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/7b94ffe2700b47218c5986bbe45ee8a0_image.png\" alt=\"image.png\" /></p>\n<p>5.Leave the default installation location for the ODBC driver and choose <strong>Next</strong>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/d68aad941ab646d28fb21f61097a035d_image.png\" alt=\"image.png\" /></p>\n<p>6.Choose <strong>Install</strong> to begin the installation.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/7005d3c6ca7f4b92a35f151b90fc3d01_image.png\" alt=\"image.png\" /></p>\n<p>7.If the <strong>User Access Control</strong> pop-up appears, choose <strong>Yes</strong> to allow the driver installation to continue.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/94968b5d1b9a4a379040b55aa87fe231_image.png\" alt=\"image.png\" /></p>\n<p>8.When the driver installation is complete, choose <strong>Finish</strong> to close the installer.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/93f655d96d884f88a055790132d6ad94_image.png\" alt=\"image.png\" /></p>\n<p>9.Open the Windows ODBC configuration application by selecting the <strong>Start</strong> bar and searching for ODBC.<br />\n10.Open the version corresponding to the Athena ODBC version you installed, in our case 64 bit.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/9a04191bb35c4a008eaf33698699f192_image.png\" alt=\"image.png\" /></p>\n<p>11.On the <strong>User DSN</strong> tab, choose <strong>Add</strong>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/eb6c74635ee8499780511d8157d6e832_image.png\" alt=\"image.png\" /></p>\n<p>12.Choose the Simba Athena ODBC driver and choose <strong>Finish</strong>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/ffe435fed4284bd0831298c00bc2d86f_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Configure_the_ODBC_driver_to_AD_FS_authentication_163\"></a><strong>Configure the ODBC driver to AD FS authentication</strong></h4>\n<p>We now need to configure the driver.</p>\n<ol>\n<li>Choose the driver on the <strong>Driver configuration</strong> page.</li>\n<li>For <strong>Data Source Name</strong>, enter <code>sampledb</code>.</li>\n<li>For <strong>Description</strong>, enter <code>Lake Formation Sample Database</code>.</li>\n<li>For <strong>AWS Region</strong>, enter <code>eu-west-1</code> or the Region you used when configuring Lake Formation.</li>\n<li>For <strong>Metadata Retrieval Method</strong>, choose Auto.</li>\n<li>For <strong>S3 Output Location</strong>, enter <code>s3://[BUCKET_NAME]/athena_results/</code>.</li>\n<li>For <strong>Encryption Options</strong>, choose <strong>NOT_SET</strong>.</li>\n<li>Clear the rest of the options.</li>\n<li>Choose <strong>Authentication Options</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/90632903f96249b2bbfc3d721f0e3265_image.png\" alt=\"image.png\" /></p>\n<p>10.For <strong>Authentication Type</strong>, choose <strong>ADFS</strong>.<br />\n11.For <strong>User</strong>, enter <code>[DOMAIN]\\[USERNAME]</code>.<br />\n12.For <strong>Password</strong>, enter your domain user password.<br />\n13.For <strong>Preferred Role</strong>, enter <code>aws:iam::[ACCOUNT NUMBER]:role/ArunADFSTest</code>.<br />\nThe preferred role is the same role configured in the previous section (ArunADFSTest).<br />\n14,。For <strong>IdP Host</strong>, enter the AD federation URL you configured during AD FS setup.<br />\n15.For <strong>IdP Port</strong>, enter <code>443</code>.<br />\n16.Select <strong>SSL</strong> <strong>Insecure</strong>.<br />\n17.Choose <strong>OK</strong>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/6ef113a41a884ae89f17d5d16146004b_image.png\" alt=\"image.png\" /></p>\n<p>18.Choose <strong>Test</strong> on the initial configuration page to test the connection.<br />\n19.When you see a success confirmation, choose <strong>OK</strong>.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/ab87cda9e89c4a3eaad87943d343204d_image.png\" alt=\"image.png\" /></p>\n<p>We can now connect to our Lake Formation sample database from our desktop environment using the Athena ODBC driver. The next step is to use Tableau to query our data using the ODBC connection</p>\n<h4><a id=\"Connect_to_your_data_using_Tableau_197\"></a><strong>Connect to your data using Tableau</strong></h4>\n<p>To connect to your data, complete the following steps:</p>\n<ol>\n<li>Open your Tableau Desktop edition.</li>\n<li>Under <strong>To a Server</strong>, choose <strong>More</strong>.</li>\n<li>On the list of available Tableau installed connectors, choose <strong>Other Databases (ODBC)</strong>.</li>\n<li>Choose the ODBC database you created earlier.</li>\n<li>Choose <strong>Connect</strong>.</li>\n<li>Choose <strong>Sign In</strong>.</li>\n</ol>\n<p><img src=\"https://dev-media.amazoncloud.cn/008162e63ba04463a9a5156825852700_image.png\" alt=\"image.png\" /></p>\n<p>When the Tableau workbook opens, select the database, schema, and table that you want to query.<br />\n7.For <strong>Database</strong>, choose the database as listed in the ODBC setup (for this post, <code>AwsDataCatalog</code>).<br />\n8.For <strong>Schema</strong>, choose your schema (<code>sampledb</code>).<br />\n9.For <strong>Table</strong>, search for and choose your table (<code>elb_logs</code>).<br />\n10.Drag the table to the work area to start your query and further report development.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/5f90378d15664427a7f1429c4daec5e3_image.png\" alt=\"image.png\" /></p>\n<h4><a id=\"Clean_up_216\"></a><strong>Clean up</strong></h4>\n<p>AWS Lake Formation provides database-, table-, column-, and tag-based access controls, and cross-account sharing at no charge. Lake Formation charges a fee for transaction requests and for metadata storage. In addition to providing a consistent view of data and enforcing row-level and cell-level security, the Lake Formation Storage API scans data in Amazon S3 and applies row and cell filters before returning results to applications. There is a fee for this filtering. To make sure that you’re not charged for any of the services that you no longer need, <a href=\"https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/terminating-instances.html\" target=\"_blank\">stop any EC2</a> instances that you created. <a href=\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeletingObjects.html\" target=\"_blank\">Remove any objects</a> in Amazon S3 you no longer require, because you pay for objects stored in S3 buckets.</p>\n<p>Lastly, <a href=\"https://docs.aws.amazon.com/directoryservice/latest/admin-guide/simple_ad_delete.html\" target=\"_blank\">delete any Active Directory</a> instances you may have created.</p>\n<h4><a id=\"Conclusion_221\"></a><strong>Conclusion</strong></h4>\n<p>Lake Formation makes it simple to set up a secure data lake and then use the data lake with your choice of analytics and machine learning services, including Tableau. In this post, we showed you how you can connect to your data lake using AD FS credentials in a simple and secure way by using the Athena ODBC driver. Your AD FS user is configured within the ODBC driver, which then assumes a role in AWS. This role is granted access to only the data you require via Lake Formation.</p>\n<p>To learn more about Lake Formation, see the <a href=\"https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html\" target=\"_blank\">Lake Formation Developer Guide</a> or follow the <a href=\"https://lakeformation.workshop.aws/\" target=\"_blank\">Lake Formation workshop</a>.</p>\n<h5><a id=\"About_the_Authors_226\"></a><strong>About the Authors</strong></h5>\n<p><img src=\"https://dev-media.amazoncloud.cn/b00b3e055e544465a6d79eb4233fef9f_image.png\" alt=\"image.png\" /></p>\n<p><strong>Jason Nicholls</strong> is an Enterprise Solutions Architect at AWS. He’s passionate about building scalable web and mobile applications on AWS. He started coding on a Commodore VIC 20, which lead to a career in software development. Jason holds a MSc in Computer Science with specialization in coevolved genetic programming. He is based in Johannesburg, South Africa.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/33495fb77e8643c086626cc15cd6bbac_image.png\" alt=\"image.png\" /></p>\n<p><strong>Francois van Rensburg</strong> is a Partner Management Solutions Architect at AWS. He has spent the last decade helping enterprise organizations successfully migrate to the cloud. He is passionate about networking and all things cloud. He started as a Cobol programmer and has built everything from software to data centers. He is based in Denver, Colorado.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭