Data warehouse and business intelligence technology consolidation using Amazon

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Organizations have been using data warehouse and business intelligence (DWBI) workloads to support business decision making for many years. These workloads are brought to the Amazon Web Services (AWS) platform to utilize the benefit of AWS cloud. However, these workloads are built using multiple vendor tools and technologies, and the customer faces the burden of administrative overhead.\n\nThis post provides architectural guidance to consolidate multiple DWBI technologies to[ AWS Managed Services](https://aws.amazon.com/managed-services/) to help reduce the administrative overhead, bring operational ease, and business efficiency. Two scenarios are explored:\n\n1. Upstream transactional databases are already on AWS\n2. Upstream transactional databases are present at on-premise datacenter\n\n\n### **Challenges faced by an organization**\n\n\nOrganizations are engaged in managing multiple DWBI technologies due to acquisitions, mergers, and the lift-and-shift of workloads. These workloads use extract, transform, and load (ETL) tools to read relational data from upstream transactional databases, process it, and store it in a data warehouse. Thereafter, these workloads use business intelligence tools to generate valuable insight and present it to users in form of reports and dashboards.\n\nThese DWBI technologies are generally installed and maintained on their own server. Figure 1 demonstrates the increased the administrative overhead for the organization but also creates challenges in maintaining the team’s overall knowledge.\n\n![image.png](https://dev-media.amazoncloud.cn/5ecd6f39996840b8a35089b4dbad9160_image.png)\n\nFigure 1. DWBI workload with multiple tools\n\nTherefore, organizations are looking to consolidate technology usage and continue supporting important business functions.\n\n\n### **Scenario 1**\n\n\nAs we know, three major functions of DWBI workstream are:\n\n- ETL data using a tool\n- Store/manage the data in a data warehouse\n- Generate information from the data using business intelligence\n\nEach of these functions can be performed efficiently using an AWS service. For example, [AWS Glue](https://aws.amazon.com/glue/) can be used for ETL, [Amazon Redshif](https://aws.amazon.com/redshift/)t for data warehouse, and [Amazon QuickSight](https://aws.amazon.com/quicksight/) for business intelligence.\n\nWith the use of mentioned AWS services, organizations will be able to consolidate their DWBI technology usage. Organizations also will be able to quickly adapt to these services, as their engineering team can more easily use their DWBI knowledge with these services. For example, using SQL knowledge in AWS Glue jobs with SprakSQL, in Amazon Redshift queries, and in Amazon QuickSight dashboards.\n\nFigure 2 demonstrates the redesigned the architecture of Figure 1 using AWS services. In this architecture, ETL functions are consolidated in AWS Glue. An [AWS Glue crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) is used to auto-catalogue the source and target table metadata; then, AWS Glue ETL jobs use these catalogues to read data from source and write to target (data warehouse). AWS Glue jobs also apply necessary transformations (such as join, filter, and aggregate) to the data before writing. Additionally, an [AWS Glue trigger](https://docs.aws.amazon.com/glue/latest/dg/about-triggers.html) is used to schedule the job executions. Alternatively, [AWS Managed Workflows for Apache Airflow](https://aws.amazon.com/managed-workflows-for-apache-airflow/) can be used to schedule jobs.\n\n![image.png](https://dev-media.amazoncloud.cn/c5be11250f3c4c9eb4b925e269562c6e_image.png)\n\nFigure 2. Consolidated workload with source on AWS\n\nSimilarly, data warehousing function is consolidated with Amazon Redshift. Amazon Redshift is used to store and organize enriched data and also enforce appropriate data access control for both workloads and users.\n\nLastly, business intelligence functions are consolidated using Amazon QuickSight. It used to create necessary dashboards that source data from Amazon Redshift and apply complex business logic to produce necessary charts and graphs needed for business insights. It is also used to implement necessary access restrictions to dashboards and data.\n\n\n### **Scenario 2**\n\n\nIn situation where source databases are in on-premises datacenter, the overall solution will be similar to Scenario 1, with an additional step to move the data continually from on-premise database to an Amazon Simple Storage Service (Amazon S3) bucket. The data movement can be efficiently handled by [AWS Database Migration Service (AWS DMS)](https://aws.amazon.com/dms/).\n\nTo make the source database accessible to AWS DMS, a connection needs to established between the AWS cloud and on-premise network. Based on performance and throughput needs, the organization can choose [either AWS Direct Connect](https://aws.amazon.com/directconnect/) service or [AWS Site-to-Site VPN](https://aws.amazon.com/vpn/site-to-site-vpn/) service to securely move the data. For the purpose of this discussion, we are considering AWS Direct Connect.\n\nIn Figure 3, AWS DMS task is used to perform a full-load followed by change data capture to continuously move the data to an S3 bucket. In this scenario, AWS Glue is used to catalogue and read the data from S3 bucket. The remaining portion of the dataflow is the same as the one mentioned in Scenario 1.\n\n![image.png](https://dev-media.amazoncloud.cn/394ff41f0d0e42aeb5d0ea033adb63d6_image.png)\n\nFigure 3. Consolidated workload with source at datacenter\n\n\n### **Scaling**\n\n\nBoth of the updated architectures provide necessary scaling:\n\n- Auto scaling feature can be used to scale-up or -down AWS Glue ETL job resources\n- Concurrency scaling feature can be used to support virtually unlimited concurrent users and queries in Amazon Redshift\n- Amazon QuickSight resources (web server, Amazon QuickSight engine, and SPICE) are auto scaled by design\n\n\n### **Security, monitoring, and auditing**\n\n\nAlso, the updated architectures provide necessary security by using access control, data encryption at-rest and in transit, monitoring, and auditing.\n\n- [AWS Key Management Service](https://aws.amazon.com/kms/) can be used to generate keys necessary for data encryption at rest.\n- [AWS CloudTrail](https://aws.amazon.com/cloudtrail/) can be used for tracking user activity and API usage for auditing and troubleshooting.\n- [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) can be used to monitor Amazon Redshift service and log generated by AWS Glue jobs.\n- [Amazon Simple Notification Service]https://aws.amazon.com/sns/() can be used for sending notifications from AWS cloud. For example, AWS Glue jobs’ execution status, Amazon QuickSight SPICE data failure notification.\n- [AWS Identity and Access Management](https://aws.amazon.com/iam/) is used for user and group access in an organization’s AWS account.\n\nAdditionally, both Amazon Redshift and Amazon QuickSight provides their own authentication and access controls. Therefore, a user can be a local user or a federated one. With the help of these authentications, an organization will be able to control access to data in Amazon Redshift and also access to the dashboard in Amazon QuickSight.\n\n\n### **Conclusion**\n\n\nIn this blog post, we discussed how AWS Glue, Amazon Redshift, and Amazon QuickSight can be used to consolidate DWBI technologies. We also have discussed how an architecture can help an organization build a scalable, secure workload with auto scaling, access control, log monitoring and activity auditing.\n\n##### **Ready to get started?**\n\n- Learn how to[ author job in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/author-job.html)\n- [Authorize connection from Amazon QuickSight to Amazon Redshift clusters](https://docs.aws.amazon.com/quicksight/latest/user/enabling-access-redshift.html)\n- Discover a typical [Amazon Redshift data processing flow](https://docs.aws.amazon.com/redshift/latest/gsg/concepts-diagrams.html#diagrams)\n- Get started by checking hands-on with the [Amazon Redshift Analytics Workshop](https://redshift-analytics.workshop.aws/intro.html)\n\n![image.png](https://dev-media.amazoncloud.cn/d6cb77f165ad4d68abefd1468f231d7e_image.png)\n\n\n### **Bappaditya Datta**\n\nBappaditya Datta is a Solution Architect with AWS North America Enterprise SA team focusing on data & analytics. He helps AWS customers across different industries to design and build secure, scalable, and highly available solutions, addressing their business needs and bringing innovations. Prior to AWS, Bappaditya worked as Technical Architect helping pharmaceutical customers adopt AWS cloud for their data & analytics needs. Bappaditya is based out of Pennsylvania and lives in a suburb near Philadelphia with his wife, daughter, and a recently adopted goldendoodle.","render":"<p>Organizations have been using data warehouse and business intelligence (DWBI) workloads to support business decision making for many years. These workloads are brought to the Amazon Web Services (AWS) platform to utilize the benefit of AWS cloud. However, these workloads are built using multiple vendor tools and technologies, and the customer faces the burden of administrative overhead.</p>\n<p>This post provides architectural guidance to consolidate multiple DWBI technologies to<a href=\"https://aws.amazon.com/managed-services/\" target=\"_blank\"> AWS Managed Services</a> to help reduce the administrative overhead, bring operational ease, and business efficiency. Two scenarios are explored:</p>\n<ol>\n<li>Upstream transactional databases are already on AWS</li>\n<li>Upstream transactional databases are present at on-premise datacenter</li>\n</ol>\n<h3><a id=\"Challenges_faced_by_an_organization_8\"></a><strong>Challenges faced by an organization</strong></h3>\n<p>Organizations are engaged in managing multiple DWBI technologies due to acquisitions, mergers, and the lift-and-shift of workloads. These workloads use extract, transform, and load (ETL) tools to read relational data from upstream transactional databases, process it, and store it in a data warehouse. Thereafter, these workloads use business intelligence tools to generate valuable insight and present it to users in form of reports and dashboards.</p>\n<p>These DWBI technologies are generally installed and maintained on their own server. Figure 1 demonstrates the increased the administrative overhead for the organization but also creates challenges in maintaining the team’s overall knowledge.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/5ecd6f39996840b8a35089b4dbad9160_image.png\" alt=\"image.png\" /></p>\n<p>Figure 1. DWBI workload with multiple tools</p>\n<p>Therefore, organizations are looking to consolidate technology usage and continue supporting important business functions.</p>\n<h3><a id=\"Scenario_1_22\"></a><strong>Scenario 1</strong></h3>\n<p>As we know, three major functions of DWBI workstream are:</p>\n<ul>\n<li>ETL data using a tool</li>\n<li>Store/manage the data in a data warehouse</li>\n<li>Generate information from the data using business intelligence</li>\n</ul>\n<p>Each of these functions can be performed efficiently using an AWS service. For example, <a href=\"https://aws.amazon.com/glue/\" target=\"_blank\">AWS Glue</a> can be used for ETL, <a href=\"https://aws.amazon.com/redshift/\" target=\"_blank\">Amazon Redshif</a>t for data warehouse, and <a href=\"https://aws.amazon.com/quicksight/\" target=\"_blank\">Amazon QuickSight</a> for business intelligence.</p>\n<p>With the use of mentioned AWS services, organizations will be able to consolidate their DWBI technology usage. Organizations also will be able to quickly adapt to these services, as their engineering team can more easily use their DWBI knowledge with these services. For example, using SQL knowledge in AWS Glue jobs with SprakSQL, in Amazon Redshift queries, and in Amazon QuickSight dashboards.</p>\n<p>Figure 2 demonstrates the redesigned the architecture of Figure 1 using AWS services. In this architecture, ETL functions are consolidated in AWS Glue. An <a href=\"https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html\" target=\"_blank\">AWS Glue crawler</a> is used to auto-catalogue the source and target table metadata; then, AWS Glue ETL jobs use these catalogues to read data from source and write to target (data warehouse). AWS Glue jobs also apply necessary transformations (such as join, filter, and aggregate) to the data before writing. Additionally, an <a href=\"https://docs.aws.amazon.com/glue/latest/dg/about-triggers.html\" target=\"_blank\">AWS Glue trigger</a> is used to schedule the job executions. Alternatively, <a href=\"https://aws.amazon.com/managed-workflows-for-apache-airflow/\" target=\"_blank\">AWS Managed Workflows for Apache Airflow</a> can be used to schedule jobs.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/c5be11250f3c4c9eb4b925e269562c6e_image.png\" alt=\"image.png\" /></p>\n<p>Figure 2. Consolidated workload with source on AWS</p>\n<p>Similarly, data warehousing function is consolidated with Amazon Redshift. Amazon Redshift is used to store and organize enriched data and also enforce appropriate data access control for both workloads and users.</p>\n<p>Lastly, business intelligence functions are consolidated using Amazon QuickSight. It used to create necessary dashboards that source data from Amazon Redshift and apply complex business logic to produce necessary charts and graphs needed for business insights. It is also used to implement necessary access restrictions to dashboards and data.</p>\n<h3><a id=\"Scenario_2_46\"></a><strong>Scenario 2</strong></h3>\n<p>In situation where source databases are in on-premises datacenter, the overall solution will be similar to Scenario 1, with an additional step to move the data continually from on-premise database to an Amazon Simple Storage Service (Amazon S3) bucket. The data movement can be efficiently handled by <a href=\"https://aws.amazon.com/dms/\" target=\"_blank\">AWS Database Migration Service (AWS DMS)</a>.</p>\n<p>To make the source database accessible to AWS DMS, a connection needs to established between the AWS cloud and on-premise network. Based on performance and throughput needs, the organization can choose <a href=\"https://aws.amazon.com/directconnect/\" target=\"_blank\">either AWS Direct Connect</a> service or <a href=\"https://aws.amazon.com/vpn/site-to-site-vpn/\" target=\"_blank\">AWS Site-to-Site VPN</a> service to securely move the data. For the purpose of this discussion, we are considering AWS Direct Connect.</p>\n<p>In Figure 3, AWS DMS task is used to perform a full-load followed by change data capture to continuously move the data to an S3 bucket. In this scenario, AWS Glue is used to catalogue and read the data from S3 bucket. The remaining portion of the dataflow is the same as the one mentioned in Scenario 1.</p>\n<p><img src=\"https://dev-media.amazoncloud.cn/394ff41f0d0e42aeb5d0ea033adb63d6_image.png\" alt=\"image.png\" /></p>\n<p>Figure 3. Consolidated workload with source at datacenter</p>\n<h3><a id=\"Scaling_60\"></a><strong>Scaling</strong></h3>\n<p>Both of the updated architectures provide necessary scaling:</p>\n<ul>\n<li>Auto scaling feature can be used to scale-up or -down AWS Glue ETL job resources</li>\n<li>Concurrency scaling feature can be used to support virtually unlimited concurrent users and queries in Amazon Redshift</li>\n<li>Amazon QuickSight resources (web server, Amazon QuickSight engine, and SPICE) are auto scaled by design</li>\n</ul>\n<h3><a id=\"Security_monitoring_and_auditing_70\"></a><strong>Security, monitoring, and auditing</strong></h3>\n<p>Also, the updated architectures provide necessary security by using access control, data encryption at-rest and in transit, monitoring, and auditing.</p>\n<ul>\n<li><a href=\"https://aws.amazon.com/kms/\" target=\"_blank\">AWS Key Management Service</a> can be used to generate keys necessary for data encryption at rest.</li>\n<li><a href=\"https://aws.amazon.com/cloudtrail/\" target=\"_blank\">AWS CloudTrail</a> can be used for tracking user activity and API usage for auditing and troubleshooting.</li>\n<li><a href=\"https://aws.amazon.com/cloudwatch/\" target=\"_blank\">Amazon CloudWatch</a> can be used to monitor Amazon Redshift service and log generated by AWS Glue jobs.</li>\n<li>[Amazon Simple Notification Service]https://aws.amazon.com/sns/() can be used for sending notifications from AWS cloud. For example, AWS Glue jobs’ execution status, Amazon QuickSight SPICE data failure notification.</li>\n<li><a href=\"https://aws.amazon.com/iam/\" target=\"_blank\">AWS Identity and Access Management</a> is used for user and group access in an organization’s AWS account.</li>\n</ul>\n<p>Additionally, both Amazon Redshift and Amazon QuickSight provides their own authentication and access controls. Therefore, a user can be a local user or a federated one. With the help of these authentications, an organization will be able to control access to data in Amazon Redshift and also access to the dashboard in Amazon QuickSight.</p>\n<h3><a id=\"Conclusion_84\"></a><strong>Conclusion</strong></h3>\n<p>In this blog post, we discussed how AWS Glue, Amazon Redshift, and Amazon QuickSight can be used to consolidate DWBI technologies. We also have discussed how an architecture can help an organization build a scalable, secure workload with auto scaling, access control, log monitoring and activity auditing.</p>\n<h5><a id=\"Ready_to_get_started_89\"></a><strong>Ready to get started?</strong></h5>\n<ul>\n<li>Learn how to<a href=\"https://docs.aws.amazon.com/glue/latest/dg/author-job.html\" target=\"_blank\"> author job in AWS Glue</a></li>\n<li><a href=\"https://docs.aws.amazon.com/quicksight/latest/user/enabling-access-redshift.html\" target=\"_blank\">Authorize connection from Amazon QuickSight to Amazon Redshift clusters</a></li>\n<li>Discover a typical <a href=\"https://docs.aws.amazon.com/redshift/latest/gsg/concepts-diagrams.html#diagrams\" target=\"_blank\">Amazon Redshift data processing flow</a></li>\n<li>Get started by checking hands-on with the <a href=\"https://redshift-analytics.workshop.aws/intro.html\" target=\"_blank\">Amazon Redshift Analytics Workshop</a></li>\n</ul>\n<p><img src=\"https://dev-media.amazoncloud.cn/d6cb77f165ad4d68abefd1468f231d7e_image.png\" alt=\"image.png\" /></p>\n<h3><a id=\"Bappaditya_Datta_99\"></a><strong>Bappaditya Datta</strong></h3>\n<p>Bappaditya Datta is a Solution Architect with AWS North America Enterprise SA team focusing on data & analytics. He helps AWS customers across different industries to design and build secure, scalable, and highly available solutions, addressing their business needs and bringing innovations. Prior to AWS, Bappaditya worked as Technical Architect helping pharmaceutical customers adopt AWS cloud for their data & analytics needs. Bappaditya is based out of Pennsylvania and lives in a suburb near Philadelphia with his wife, daughter, and a recently adopted goldendoodle.</p>\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家