Visualize MongoDB data from Amazon QuickSight using Amazon Athena Federated Query

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"In this post, you will learn how to use [Amazon Athena Federated Query](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html) to connect a MongoDB database to [Amazon QuickSight](https://aws.amazon.com/quicksight/) in order to build dashboards and visualizations.\n\n[Amazon Athena](http://aws.amazon.com/athena) is a serverless interactive query service, based on [Presto](https://aws.amazon.com/big-data/what-is-presto/), that provides full ANSI SQL support to query a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet, that are stored on [Amazon Simple Storage Service](http://aws.amazon.com/s3) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)). For data that isn’t stored on [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail), you can use Athena Federated Query to query the data in place or build pipelines that extract data from multiple data sources and store it in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail). With Athena Federated Query, you can run SQL queries across data that is stored in relational, non-relational, object, and custom data sources.\n\nMongoDB is a popular NoSQL database option for websites and API endpoints. You can choose to deploy MongoDB as a self-hosted or fully-managed database. Databases are a popular choice for UI applications for managing user profiles, product catalogs, profile views, clickstream events, events from a connected device, and so on. QuickSight is a serverless business analytics service with built-in machine learning (ML) capabilities that can automatically look for patterns and outliers, and has the flexibility to embed dashboards in applications for a data-driven experience. You can also use [QuickSight Q](https://aws.amazon.com/quicksight/q/) to allow users to ask questions using natural language and find answers to business questions immediately.\n\n### **Overview of Athena Federated Query**\n\nAthena Federated Query uses data source connectors that run on [AWS Lambda](http://aws.amazon.com/lambda) to run federated queries to other data sources. Prebuilt data source connectors are available for native stores, like [Amazon Timestream](https://aws.amazon.com/timestream/), [Amazon CloudWatch Logs](http://aws.amazon.com/cloudwatch), [Amazon DynamoDB](http://aws.amazon.com/dynamodb), and external sources like Vertica and SAP Hana. You can also write a connector by using the [Athena Query Federation SDK](https://docs.aws.amazon.com/athena/latest/ug/connect-data-source-federation-sdk.html). You can customize Athena’s [prebuilt connectors](https://github.com/awslabs/aws-athena-query-federation/wiki/Available-Connectors) for your own use, or modify a copy of the source code to create your own [AWS Serverless Application Repository](https://aws.amazon.com/serverless/serverlessrepo/) package.\n\n### **Solution overview**\n\nThe following architecture diagram shows the components of the Athena Federated Query MongoDB connector. It contains the following components:\n\n- A virtual private cloud (VPC) configured with public and private subnets across three Availability Zones.\n- A MongoDB cluster with customizable [Amazon Elastic Block Store](http://aws.amazon.com/ebs) ([Amazon EBS](https://aws.amazon.com/cn/ebs/?trk=cndc-detail)) storage deployed in private subnets and NAT gateways in a public subnet for outbound internet connectivity for MongoDB instances.\n- Bastion hosts in an auto scaling group with Elastic IP addresses to allow inbound SSH access.\n- An [AWS Identity and Access Management](http://aws.amazon.com/iam) (IAM) ```MongoDBnode``` role with [Amazon Elastic Compute Cloud](http://aws.amazon.com/ec2) (Amazon EC2) and [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) permissions.\n- Security groups to enable communication within the VPC.\n- Lambda functions deployed in a private subnet accessing S3 buckets. Athena invokes the Lambda function, which in turn fetches the data from MongoDB and maps the response back to Athena.\n- [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) through a VPC endpoint.\n\n![image.png](https://dev-media.amazoncloud.cn/a0c7a406ef1b4c909ee51c3462a7e8be_image.png)\n\n### **Prerequisites**\n\nTo implement the solution, you need the following:\n\n- An AWS account to access AWS services.\n- An IAM user with permission to ```CreateRole```, ```ListRoles```, ```GetPolicy```, and ```AttachRolePolicy```.\n- An IAM user with an access key and secret key to configure an integrated development environment (IDE).\n- A MongoDB database. You can deploy a hosted [MongoDB on Amazon EC2](https://aws.amazon.com/quickstart/architecture/mongodb/) or [MongoDB Atlas in a VPC](https://www.mongodb.com/cloud/atlas/register).\n- If you don’t have a QuickSight subscription configured, [sign up for one](https://docs.aws.amazon.com/quicksight/latest/user/signing-up.html). You can access the QuickSight free trial as part of the [AWS Free Tier](http://aws.amazon.com/free/) option.\n- A new secret in Secrets Manager to store your MongoDB user name and password.\n- Data loaded into your MongoDB database. For this example, we used an [airline dataset](https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat). [Load the sample data](https://docs.atlas.mongodb.com/sample-data/) either from the MongoDB command line or the MongoDB Atlas user interface, if using MongoDB Atlas.\n\n### **Configure a Lambda connector**\n\nThe first step in the deployment is to set up the connector environment. Athena uses [data source connectors](https://docs.aws.amazon.com/athena/latest/ug/athena-prebuilt-data-connectors.html) that run on Lambda to run federated queries. To connect with MongoDB, use the [Amazon Athena DocumentDB Connector](https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-docdb), which also works with any endpoint that is compatible with MongoDB.\n\nTo configure a Lambda connector, complete the following steps:\n\n1.On the Athena console, choose **Data sources** in the navigation pane.\n2.To view a published list of data sources for Athena, select **[Amazon DocumentDB](https://aws.amazon.com/cn/documentdb/?trk=cndc-detail)**.\n3.Choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/645bfad9426f48bf953dc77ae43249d2_image.png)\n\n4.In the **Data source details** section, give your data source a unique name; for example, ```ds_mongo```.\nThis will be the connection name that appears under Data sources for Athena.\n\n![image.png](https://dev-media.amazoncloud.cn/bdf2d09e57cb4f38aff565c834a658d6_image.png)\n\n5.Choose **Create Lambda function**.\nThis launches the Create function page in Lambda. The connector is deployed by using [AWS Serverless Application Repository](https://aws.amazon.com/cn/serverless/serverlessrepo/?trk=cndc-detail).\n\n![image.png](https://dev-media.amazoncloud.cn/1ad28735f32343aa85b101ce505da2e7_image.png)\n\n6.For SecretNameOrPrefix, enter ```mongo```.\n7.For SpillBucket, enter ```spl-mongo-athena-test```.\n8.For AthenaCatalogName, enter ```us-west-mongo-cat```.\n9.For DocDBConnectionString (the MongoDB connection),enter the following:\n```\\nmongodb://\${docdb_instance_1_creds}@replace_with_mongodb_private_ip:27017/?authSource=admin&readPreference=secondaryPreferred&retryWrites=false; \\n```\n10.For **SecurityGroupIds**, choose the security group that you want to associate with the function. Make sure that the security group of the MongoDB instance allows traffic from the Lambda function.\n11.For **SpillPrefix**, enter ```athena-spill```.\n12.For **Subnetids**, enter the subnet IDs of subnets with MongoDB instances.\nIn this case, **LambdaMemory** and **LambdaTimeout** have been set to the maximum values, but these can vary depending on the query run and memory requirements. **SpillBucket** is an S3 bucket in your account to store data that exceeds the Lambda function response size limits.\n13.Keep the rest as defaults.\n14.Select the acknowledgement check box choose **Deploy**.\nThe connection function is launched based on the given parameters.\n\n![image.png](https://dev-media.amazoncloud.cn/8322bb19a5224945975aa900c4bdf842_image.png)\n\n15.Create a [VPC endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html) to allow the Lambda function to access [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) through an endpoint.\n16.This is for the spill bucket. The spill bucket is a staging area for copying the results of the queries that are performed on MongoDB via Athena federation. This is so that the Lambda function in the VPC can access [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail).\nGo back to the Athena console.\n17.Under **Connection details**, for **Lambda function**, choose the newly created Lambda function.\n18.Choose **Next**.\n\n![image.png](https://dev-media.amazoncloud.cn/f1113ed7ba974a888d2f61bbc14d5e08_image.png)\n\n19.To verify the connection, on the Athena console, choose **Data sources**, then choose ```ds_mongo```.\nAssociated databases from the connection should be listed.\n\n![image.png](https://dev-media.amazoncloud.cn/54914967e65743ea93d196735f85657c_image.png)\n\nYou should now be able to query the datasets from the Athena query editor by using SQL.\n\n20.In the query editor, for **Data Source**, choose ```ds_mongo```.\n\nAthena federates the query using the connector, which invokes the Lambda function. Then the query is performed by the function on MongoDB, and the query results are translated back to Athena. The following is a sample query that was performed on the airlines dataset.\n\n![image.png](https://dev-media.amazoncloud.cn/cf26f54159104890a3191b70331dfd18_image.png)\n\n### **Create a dataset on QuickSight to read the data from MongoDB**\n\nBefore you launch QuickSight for the first time in an AWS account, you must set up an account. For instructions, see [Signing in to Amazon QuickSight](https://docs.aws.amazon.com/quicksight/latest/user/signing-in.html).\n\nAfter the initial setup, you can create a dataset with Athena as the source. The QuickSight service role needs permission to invoke the Lambda function that connects MongoDB. The ```aws-quicksight-service-role-v0 service``` role is automatically created with the QuickSight account.\n\nTo create a dataset in QuickSight, complete the following steps:\n\n1.On the IAM console, in the navigation pane, choose **Roles**.\n2.Search for the role ```aws-quicksight-service-role-v0``` and [add the permission](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) ```Lambda _fullaccess```.\nIn an organization, there could be different data stores based on data load and consumption patterns. Examples include catalog or manual data that is associated with products in a MongoDB or key-value index store, transactions or sales data in a SQL database, and images or video clips that are associated with the product in an object store.\n\n![image.png](https://dev-media.amazoncloud.cn/706587084bd94685a5981551f60d3497_image.png)\n\nIn this case, an ```airlines``` table from MongoDB is joined with a flat file that contains information on the airports.\n\n3.Use the QuickSight cross-data store feature to join data from different sources on common fields.\n\n![image.png](https://dev-media.amazoncloud.cn/a1eb187b98044c61b8951877b2013229_image.png)\n\n4.We then update the data types for our geographic fields like fields like city, country, latitude, and longitude so we can build maps later.\n5.You can also create calculated fields while preparing your dataset, which allows you to reuse them in other QuickSight analyses.\n\n![image.png](https://dev-media.amazoncloud.cn/3b39bc79772b4e3fb6525ace34c5a446_image.png)\n\nWith a few clicks, you should be able to [create a dashboard](https://www.youtube.com/channel/UCqtI0cKSreCwUUuKOlA1tow/videos) with the published dataset. For instance, you can plot your data on a map, show trends in a line chart, and add autonarratives from the list of Suggested Insights to create the analysis shown in the following screenshot.\n\n![image.png](https://dev-media.amazoncloud.cn/36f9018fe6ae41808cada6bc078022f6_image.png)\n\n### **Clean up**\n\nMake sure to clean up your resources to avoid resource spend and associated costs. You need to delete the EC2 instances with MongoDB. In the case of MongoDB Atlas, you can delete the databases and tables. Delete the Athena data source ```ds_mongo``` and unsubscribe your QuickSight account from the Manage QuickSight admin page.\n\n### **Conclusion**\n\nWith QuickSight and Athena Federated Query, organizations can access additional data sources beyond those already supported by QuickSight. If you have data in sources other than [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail), you can use Athena [Federated Query](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html) to analyze the data in place or build pipelines that extract and store data in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail). Athena now also supports cross-account federated queries to enable teams of analysts, data scientists, and data engineers to query data stored in other AWS accounts. Try connecting to proprietary data formats and sources, or build new user-defined functions, with the [Athena Query Federation SDK](https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-federation-sdk/README.md).\n\n#### **About the Author**\n\n![image.png](https://dev-media.amazoncloud.cn/b805f54e72a24ae8a6089f0c08dd6fe9_image.png)\n\n**Soujanya Konka** is a Solutions Architect and Analytics specialist at AWS, focused on helping customers build their ideas on cloud. Expertise in design and implementation of business information systems and Data warehousing solutions. Before joining AWS, Soujanya has had stints with companies such as HSBC, Cognizant.\n\n![image.png](https://dev-media.amazoncloud.cn/b23e50f79a1344b788a790aba727f929_image.png)\n\n**Nilesh Parekh** is a Partner Solution Architect with ISV India segment. Nilesh help assist partner to review and remediate their workload running on AWS based on the AWS Well-Architected and Foundational Technical Review best practices. He also helps assist partners on Application Modernizations and delivering POCs.","render":"<p>In this post, you will learn how to use <a href=\\"https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html\\" target=\\"_blank\\">Amazon Athena Federated Query</a> to connect a MongoDB database to <a href=\\"https://aws.amazon.com/quicksight/\\" target=\\"_blank\\">Amazon QuickSight</a> in order to build dashboards and visualizations.</p>\\n<p><a href=\\"http://aws.amazon.com/athena\\" target=\\"_blank\\">Amazon Athena</a> is a serverless interactive query service, based on <a href=\\"https://aws.amazon.com/big-data/what-is-presto/\\" target=\\"_blank\\">Presto</a>, that provides full ANSI SQL support to query a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet, that are stored on <a href=\\"http://aws.amazon.com/s3\\" target=\\"_blank\\">Amazon Simple Storage Service</a> ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)). For data that isn’t stored on [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail), you can use Athena Federated Query to query the data in place or build pipelines that extract data from multiple data sources and store it in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail). With Athena Federated Query, you can run SQL queries across data that is stored in relational, non-relational, object, and custom data sources.</p>\\n<p>MongoDB is a popular NoSQL database option for websites and API endpoints. You can choose to deploy MongoDB as a self-hosted or fully-managed database. Databases are a popular choice for UI applications for managing user profiles, product catalogs, profile views, clickstream events, events from a connected device, and so on. QuickSight is a serverless business analytics service with built-in machine learning (ML) capabilities that can automatically look for patterns and outliers, and has the flexibility to embed dashboards in applications for a data-driven experience. You can also use <a href=\\"https://aws.amazon.com/quicksight/q/\\" target=\\"_blank\\">QuickSight Q</a> to allow users to ask questions using natural language and find answers to business questions immediately.</p>\\n<h3><a id=\\"Overview_of_Athena_Federated_Query_6\\"></a><strong>Overview of Athena Federated Query</strong></h3>\\n<p>Athena Federated Query uses data source connectors that run on <a href=\\"http://aws.amazon.com/lambda\\" target=\\"_blank\\">AWS Lambda</a> to run federated queries to other data sources. Prebuilt data source connectors are available for native stores, like <a href=\\"https://aws.amazon.com/timestream/\\" target=\\"_blank\\">Amazon Timestream</a>, <a href=\\"http://aws.amazon.com/cloudwatch\\" target=\\"_blank\\">Amazon CloudWatch Logs</a>, <a href=\\"http://aws.amazon.com/dynamodb\\" target=\\"_blank\\">Amazon DynamoDB</a>, and external sources like Vertica and SAP Hana. You can also write a connector by using the <a href=\\"https://docs.aws.amazon.com/athena/latest/ug/connect-data-source-federation-sdk.html\\" target=\\"_blank\\">Athena Query Federation SDK</a>. You can customize Athena’s <a href=\\"https://github.com/awslabs/aws-athena-query-federation/wiki/Available-Connectors\\" target=\\"_blank\\">prebuilt connectors</a> for your own use, or modify a copy of the source code to create your own <a href=\\"https://aws.amazon.com/serverless/serverlessrepo/\\" target=\\"_blank\\">AWS Serverless Application Repository</a> package.</p>\\n<h3><a id=\\"Solution_overview_10\\"></a><strong>Solution overview</strong></h3>\\n<p>The following architecture diagram shows the components of the Athena Federated Query MongoDB connector. It contains the following components:</p>\n<ul>\\n<li>A virtual private cloud (VPC) configured with public and private subnets across three Availability Zones.</li>\n<li>A MongoDB cluster with customizable <a href=\\"http://aws.amazon.com/ebs\\" target=\\"_blank\\">Amazon Elastic Block Store</a> ([Amazon EBS](https://aws.amazon.com/cn/ebs/?trk=cndc-detail)) storage deployed in private subnets and NAT gateways in a public subnet for outbound internet connectivity for MongoDB instances.</li>\\n<li>Bastion hosts in an auto scaling group with Elastic IP addresses to allow inbound SSH access.</li>\n<li>An <a href=\\"http://aws.amazon.com/iam\\" target=\\"_blank\\">AWS Identity and Access Management</a> (IAM) <code>MongoDBnode</code> role with <a href=\\"http://aws.amazon.com/ec2\\" target=\\"_blank\\">Amazon Elastic Compute Cloud</a> (Amazon EC2) and [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) permissions.</li>\\n<li>Security groups to enable communication within the VPC.</li>\n<li>Lambda functions deployed in a private subnet accessing S3 buckets. Athena invokes the Lambda function, which in turn fetches the data from MongoDB and maps the response back to Athena.</li>\n<li><a href=\\"https://aws.amazon.com/secrets-manager/\\" target=\\"_blank\\">AWS Secrets Manager</a> through a VPC endpoint.</li>\\n</ul>\n<p><img src=\\"https://dev-media.amazoncloud.cn/a0c7a406ef1b4c909ee51c3462a7e8be_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Prerequisites_24\\"></a><strong>Prerequisites</strong></h3>\\n<p>To implement the solution, you need the following:</p>\n<ul>\\n<li>An AWS account to access AWS services.</li>\n<li>An IAM user with permission to <code>CreateRole</code>, <code>ListRoles</code>, <code>GetPolicy</code>, and <code>AttachRolePolicy</code>.</li>\\n<li>An IAM user with an access key and secret key to configure an integrated development environment (IDE).</li>\n<li>A MongoDB database. You can deploy a hosted <a href=\\"https://aws.amazon.com/quickstart/architecture/mongodb/\\" target=\\"_blank\\">MongoDB on Amazon EC2</a> or <a href=\\"https://www.mongodb.com/cloud/atlas/register\\" target=\\"_blank\\">MongoDB Atlas in a VPC</a>.</li>\\n<li>If you don’t have a QuickSight subscription configured, <a href=\\"https://docs.aws.amazon.com/quicksight/latest/user/signing-up.html\\" target=\\"_blank\\">sign up for one</a>. You can access the QuickSight free trial as part of the <a href=\\"http://aws.amazon.com/free/\\" target=\\"_blank\\">AWS Free Tier</a> option.</li>\\n<li>A new secret in Secrets Manager to store your MongoDB user name and password.</li>\n<li>Data loaded into your MongoDB database. For this example, we used an <a href=\\"https://raw.githubusercontent.com/jpatokal/openflights/master/data/airlines.dat\\" target=\\"_blank\\">airline dataset</a>. <a href=\\"https://docs.atlas.mongodb.com/sample-data/\\" target=\\"_blank\\">Load the sample data</a> either from the MongoDB command line or the MongoDB Atlas user interface, if using MongoDB Atlas.</li>\\n</ul>\n<h3><a id=\\"Configure_a_Lambda_connector_36\\"></a><strong>Configure a Lambda connector</strong></h3>\\n<p>The first step in the deployment is to set up the connector environment. Athena uses <a href=\\"https://docs.aws.amazon.com/athena/latest/ug/athena-prebuilt-data-connectors.html\\" target=\\"_blank\\">data source connectors</a> that run on Lambda to run federated queries. To connect with MongoDB, use the <a href=\\"https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-docdb\\" target=\\"_blank\\">Amazon Athena DocumentDB Connector</a>, which also works with any endpoint that is compatible with MongoDB.</p>\\n<p>To configure a Lambda connector, complete the following steps:</p>\n<p>1.On the Athena console, choose <strong>Data sources</strong> in the navigation pane.<br />\\n2.To view a published list of data sources for Athena, select <strong>Amazon DocumentDB</strong>.<br />\\n3.Choose <strong>Next</strong>.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/645bfad9426f48bf953dc77ae43249d2_image.png\\" alt=\\"image.png\\" /></p>\n<p>4.In the <strong>Data source details</strong> section, give your data source a unique name; for example, <code>ds_mongo</code>.<br />\\nThis will be the connection name that appears under Data sources for Athena.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/bdf2d09e57cb4f38aff565c834a658d6_image.png\\" alt=\\"image.png\\" /></p>\n<p>5.Choose <strong>Create Lambda function</strong>.<br />\\nThis launches the Create function page in Lambda. The connector is deployed by using AWS Serverless Application Repository.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/1ad28735f32343aa85b101ce505da2e7_image.png\\" alt=\\"image.png\\" /></p>\n<p>6.For SecretNameOrPrefix, enter <code>mongo</code>.<br />\\n7.For SpillBucket, enter <code>spl-mongo-athena-test</code>.<br />\\n8.For AthenaCatalogName, enter <code>us-west-mongo-cat</code>.<br />\\n9.For DocDBConnectionString (the MongoDB connection),enter the following:</p>\n<pre><code class=\\"lang-\\">mongodb://\${docdb_instance_1_creds}@replace_with_mongodb_private_ip:27017/?authSource=admin&amp;readPreference=secondaryPreferred&amp;retryWrites=false; \\n</code></pre>\\n<p>10.For <strong>SecurityGroupIds</strong>, choose the security group that you want to associate with the function. Make sure that the security group of the MongoDB instance allows traffic from the Lambda function.<br />\\n11.For <strong>SpillPrefix</strong>, enter <code>athena-spill</code>.<br />\\n12.For <strong>Subnetids</strong>, enter the subnet IDs of subnets with MongoDB instances.<br />\\nIn this case, <strong>LambdaMemory</strong> and <strong>LambdaTimeout</strong> have been set to the maximum values, but these can vary depending on the query run and memory requirements. <strong>SpillBucket</strong> is an S3 bucket in your account to store data that exceeds the Lambda function response size limits.<br />\\n13.Keep the rest as defaults.<br />\\n14.Select the acknowledgement check box choose <strong>Deploy</strong>.<br />\\nThe connection function is launched based on the given parameters.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/8322bb19a5224945975aa900c4bdf842_image.png\\" alt=\\"image.png\\" /></p>\n<p>15.Create a <a href=\\"https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html\\" target=\\"_blank\\">VPC endpoint</a> to allow the Lambda function to access [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) through an endpoint.<br />\\n16.This is for the spill bucket. The spill bucket is a staging area for copying the results of the queries that are performed on MongoDB via Athena federation. This is so that the Lambda function in the VPC can access Amazon S3.<br />\\nGo back to the Athena console.<br />\\n17.Under <strong>Connection details</strong>, for <strong>Lambda function</strong>, choose the newly created Lambda function.<br />\\n18.Choose <strong>Next</strong>.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/f1113ed7ba974a888d2f61bbc14d5e08_image.png\\" alt=\\"image.png\\" /></p>\n<p>19.To verify the connection, on the Athena console, choose <strong>Data sources</strong>, then choose <code>ds_mongo</code>.<br />\\nAssociated databases from the connection should be listed.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/54914967e65743ea93d196735f85657c_image.png\\" alt=\\"image.png\\" /></p>\n<p>You should now be able to query the datasets from the Athena query editor by using SQL.</p>\n<p>20.In the query editor, for <strong>Data Source</strong>, choose <code>ds_mongo</code>.</p>\\n<p>Athena federates the query using the connector, which invokes the Lambda function. Then the query is performed by the function on MongoDB, and the query results are translated back to Athena. The following is a sample query that was performed on the airlines dataset.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/cf26f54159104890a3191b70331dfd18_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Create_a_dataset_on_QuickSight_to_read_the_data_from_MongoDB_96\\"></a><strong>Create a dataset on QuickSight to read the data from MongoDB</strong></h3>\\n<p>Before you launch QuickSight for the first time in an AWS account, you must set up an account. For instructions, see <a href=\\"https://docs.aws.amazon.com/quicksight/latest/user/signing-in.html\\" target=\\"_blank\\">Signing in to Amazon QuickSight</a>.</p>\\n<p>After the initial setup, you can create a dataset with Athena as the source. The QuickSight service role needs permission to invoke the Lambda function that connects MongoDB. The <code>aws-quicksight-service-role-v0 service</code> role is automatically created with the QuickSight account.</p>\\n<p>To create a dataset in QuickSight, complete the following steps:</p>\n<p>1.On the IAM console, in the navigation pane, choose <strong>Roles</strong>.<br />\\n2.Search for the role <code>aws-quicksight-service-role-v0</code> and <a href=\\"https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html\\" target=\\"_blank\\">add the permission</a> <code>Lambda _fullaccess</code>.<br />\\nIn an organization, there could be different data stores based on data load and consumption patterns. Examples include catalog or manual data that is associated with products in a MongoDB or key-value index store, transactions or sales data in a SQL database, and images or video clips that are associated with the product in an object store.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/706587084bd94685a5981551f60d3497_image.png\\" alt=\\"image.png\\" /></p>\n<p>In this case, an <code>airlines</code> table from MongoDB is joined with a flat file that contains information on the airports.</p>\\n<p>3.Use the QuickSight cross-data store feature to join data from different sources on common fields.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/a1eb187b98044c61b8951877b2013229_image.png\\" alt=\\"image.png\\" /></p>\n<p>4.We then update the data types for our geographic fields like fields like city, country, latitude, and longitude so we can build maps later.<br />\\n5.You can also create calculated fields while preparing your dataset, which allows you to reuse them in other QuickSight analyses.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/3b39bc79772b4e3fb6525ace34c5a446_image.png\\" alt=\\"image.png\\" /></p>\n<p>With a few clicks, you should be able to <a href=\\"https://www.youtube.com/channel/UCqtI0cKSreCwUUuKOlA1tow/videos\\" target=\\"_blank\\">create a dashboard</a> with the published dataset. For instance, you can plot your data on a map, show trends in a line chart, and add autonarratives from the list of Suggested Insights to create the analysis shown in the following screenshot.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/36f9018fe6ae41808cada6bc078022f6_image.png\\" alt=\\"image.png\\" /></p>\n<h3><a id=\\"Clean_up_125\\"></a><strong>Clean up</strong></h3>\\n<p>Make sure to clean up your resources to avoid resource spend and associated costs. You need to delete the EC2 instances with MongoDB. In the case of MongoDB Atlas, you can delete the databases and tables. Delete the Athena data source <code>ds_mongo</code> and unsubscribe your QuickSight account from the Manage QuickSight admin page.</p>\\n<h3><a id=\\"Conclusion_129\\"></a><strong>Conclusion</strong></h3>\\n<p>With QuickSight and Athena Federated Query, organizations can access additional data sources beyond those already supported by QuickSight. If you have data in sources other than Amazon S3, you can use Athena <a href=\\"https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html\\" target=\\"_blank\\">Federated Query</a> to analyze the data in place or build pipelines that extract and store data in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail). Athena now also supports cross-account federated queries to enable teams of analysts, data scientists, and data engineers to query data stored in other AWS accounts. Try connecting to proprietary data formats and sources, or build new user-defined functions, with the <a href=\\"https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-federation-sdk/README.md\\" target=\\"_blank\\">Athena Query Federation SDK</a>.</p>\\n<h4><a id=\\"About_the_Author_133\\"></a><strong>About the Author</strong></h4>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/b805f54e72a24ae8a6089f0c08dd6fe9_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Soujanya Konka</strong> is a Solutions Architect and Analytics specialist at AWS, focused on helping customers build their ideas on cloud. Expertise in design and implementation of business information systems and Data warehousing solutions. Before joining AWS, Soujanya has had stints with companies such as HSBC, Cognizant.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/b23e50f79a1344b788a790aba727f929_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Nilesh Parekh</strong> is a Partner Solution Architect with ISV India segment. Nilesh help assist partner to review and remediate their workload running on AWS based on the AWS Well-Architected and Foundational Technical Review best practices. He also helps assist partners on Application Modernizations and delivering POCs.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭