Migrate a large data warehouse from Greenplum to Amazon Redshift using Amazon SCT – Part 1

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"A [data warehouse](https://en.wikipedia.org/wiki/Data_warehouse) collects and consolidates data from various sources within your organization. It’s used as a centralized data repository for analytics and business intelligence.\n\nWhen working with on-premises legacy data warehouses, scaling the size of your data warehouse or improving performance can mean purchasing new hardware or adding more powerful hardware. This is often expensive and time-consuming. Running your own on-premises data warehouse also requires hiring database managers, administrators to deal with outages, upgrades, and data access requests. As companies become more data-driven, reliable access to centralized data is increasingly important. As a result, there is a strong demand for data warehouses that are fast, accessible, and able to scale elastically with business needs. Cloud data warehouses like [Amazon Redshift](https://aws.amazon.com/redshift/) address these needs while eliminating the cost and risk of purchasing new hardware.\n\nThis multi-part series explains how to migrate an on-premises Greenplum data warehouse to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) using [Amazon Web Services Schema Conversion Tool](https://aws.amazon.com/dms/schema-conversion-tool/) (Amazon Web Services SCT). In this first post, we describe how to plan, run, and validate the large-scale data warehouse migration. It covers the solution overview, migration assessment, and guidance on technical and business validation. In the second post, we share best practices for choosing the optimal [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster, data architecture, converting [stored procedures](https://docs.aws.amazon.com/redshift/latest/dg/stored-procedure-create.html), compatible functions and queries widely used for SQL conversions, and recommendations for optimizing the length of data types for table columns.\n\n\n#### **Solution overview**\n\n\n[Amazon Redshift](https://aws.amazon.com/redshift/) is an industry-leading cloud data warehouse. [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) uses Structured Query Language ([SQL](https://en.wikipedia.org/wiki/SQL)) to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes using Amazon Web Services-designed hardware and machine learning to deliver the best price-performance at any scale.\n\nAmazon Web Services SCT makes heterogeneous database migrations predictable by automatically converting the source database schema and most of the database code objects, SQL scripts, views, stored procedures, and functions to a format compatible with the target database. Amazon Web Services SCT helps you modernize your applications simultaneously during database migration. When schema conversion is complete, Amazon Web Services SCT can help migrate data from various data warehouses to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) using data extraction [agents](https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/agents.html).\n\nThe following diagram illustrates our architecture for migrating data from Greenplum to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) using Amazon Web Services SCT data extraction agents.\n\n![image.png](https://dev-media.amazoncloud.cn/e948c862f1774faaa75f0c4f7f46110d_image.png)\n\n\n#### **Perform a migration assessment**\n\n\nThe initial data migration is the first milestone of the project. The main requirements for this phase are to minimize the impact on the data source and transfer the data as fast as possible. To do this, Amazon Web Services offers several options, depending on the size of the database, network performance ([Amazon Web Services Direct Connect](https://aws.amazon.com/directconnect/) or [Amazon Web Services Snowball](https://aws.amazon.com/snowball/)), and whether the migration is heterogeneous or not (Amazon Web Services Database Migration Service (Amazon Web Services DMS) or Amazon Web Services SCT).\n\nAmazon Web Services provides a portfolio of [cloud data migration services](https://aws.amazon.com/cloud-data-migration/) to provide the right solution for any data migration project. The level of connectivity is a significant factor in data migration, and Amazon Web Services has offerings that can address your hybrid cloud storage, online data transfer, and offline data transfer needs.\n\nAdditionally, the [Amazon Web Services Snow Family](https://aws.amazon.com/snow/) makes it simple to get your data into and out of Amazon Web Services via offline methods. Based on the size of the data, you can use [Amazon Web Services Snowmobile](https://aws.amazon.com/snowmobile/) or [Amazon Web Services Snowball](https://aws.amazon.com/snowball/) if you have petabytes to exabytes of data. To decide which transfer method is better for your use case, refer to [Performance for Amazon Web Services Snowball](https://docs.aws.amazon.com/snowball/latest/ug/performance.html).\n\n\n#### **Perform schema conversion with Amazon Web Services SCT**\n\n\nTo convert your schema using Amazon Web Services SCT, you must start a new Amazon Web Services SCT project and connect your databases. Complete the following steps:\n\nInstall Amazon Web Services SCT.\nOpen and initiate a new project.\nFor **Source database engine**, choose **Greenplum**.\nFor **Target database engine**, choose **[Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail)**.\nChoose **OK**.\n\n![image.png](https://dev-media.amazoncloud.cn/4e66f8f8615c4d4395880e90fa12f369_image.png)\n\n6. Open your project and choose Connect to Greenplum.\n\n![image.png](https://dev-media.amazoncloud.cn/a3a40d02585e481e884386b96731448d_image.png)\n\n7. Enter the Greenplum database information.\n8. Choose **Test connection**.\n\n![image.png](https://dev-media.amazoncloud.cn/6e1c3d17b74444c6bbba93fa4757ebfd_image.png)\n\n9. Choose OK after a successful connection test.\n10. Choose OK to complete the connection.\n\n![image.png](https://dev-media.amazoncloud.cn/88c0c91f261f40cd8f4bae429ad45090_image.png)\n\n11.Repeat similar steps to establish a connection to your [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster.\n\n![image.png](https://dev-media.amazoncloud.cn/8380b29c487849809646eebaec1b973f_image.png)\n\nBy default, Amazon Web Services SCT uses Amazon Web Services Glue as the extract, transform, and load (ETL) solution for the migration. Before you continue, you must disable this setting.\n\n12. On the Settings menu, choose Project settings.\n\n![image.png](https://dev-media.amazoncloud.cn/92de7f36e34646418deb42d7312c2bbc_image.png)\n\n13. Deselect **Use Amazon Web Services Glue**.\n14. Choose **OK**.\n\n![image.png](https://dev-media.amazoncloud.cn/707e68732be34ea1a83f876f2a4b4e32_image.png)\n\n15. In the left pane, choose your schema (right-click) and choose **Convert schema**.\n\n![image.png](https://dev-media.amazoncloud.cn/e74c6b3bca944be298d15f787dab90d9_image.png)\n\n16. When asked to replace objects, choose Yes.\n\n![image.png](https://dev-media.amazoncloud.cn/5593d6e7d23d4642af76f202bd82dbbd_image.png)\n\n\n17. When asked to load statistics, choose Continue.\n\n![image.png](https://dev-media.amazoncloud.cn/673da57cba34426780ec88db3534e96b_image.png)\n\nBy the end of this step, all Greenplum objects should be migrated to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) syntax. Some objects may be shown in red, meaning that Amazon Web Services SCT couldn’t fully migrate these objects. You can view an assessment summary of the migration for more information.\n\n18. On the View menu, choose Assessment report view.\n\n![image.png](https://dev-media.amazoncloud.cn/3605c224efbc41feaa0bd3c23c0f9d59_image.png)\n\nIn the bottom pane, you can see Greenplum DDL and [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) DDL of the selected objects side by side for comparison.\n\n![image.png](https://dev-media.amazoncloud.cn/6ae03c20ca3f40dabe0c86321553f51e_image.png)\n\n19. Choose the schema with a red icon, which indicates that it needs manual conversion.You’re presented with specific actions regarding the tables, constraints, or views that can’t be migrated to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail). You must investigate these issues and fix the errors manually with the required changes. Some examples are binary data in BLOB format, which Amazon Web Services SCT automatically converts to character varying data type, but this may be highlighted as an issue. Additionally, some vendor-supplied procedures and functions couldn’t be converted, so Amazon Web Services SCT can error out.\n\n![image.png](https://dev-media.amazoncloud.cn/d30927144d4944d0b306aa53a6c92e59_image.png)\n\n20. Connect using the [Amazon Redshift query editor v2](https://aws.amazon.com/blogs/aws/amazon-redshift-query-editor-v2-web-query-authoring/) or another third-party tool or utility of your choice and check for all the tables with the following code:\n\n\n```\\nselect t.table_schema, t.table_name,i.tbl_rows\\nfrom svv_tables t left join svv_table_info i \\non t.table_schema = i.schema and t.table_name = i.table\\nwhere t.table_type = 'BASE TABLE' and t.table_schema='SCHEMA NAME'\\norder by 1,2;\\n```\n\n#### **Migrate the data**\n\n\nTo start your data migration using Amazon Web Services SCT data extraction agents, complete the following steps:\n\n1. Configure the Amazon Web Services SCT extractor properties file with corresponding Greenplum properties:\n\n```\\nport=8192\\nvendor=GREENPLUM\\ndriver.jars=\\"C:\\\\DMS Workshop\\\\JDBC\\\\postgresql-42.2.9.jar\\"\\nredshift.driver.jars=C:\\\\DMS Workshop\\\\JDBC\\\\RedshiftJDBC42-no-awssdk-1.2.43.1067.jar\\nworking.folder=C:/Users/developer\\nssl.option=ON\\nssl.require.client.authentication=ON\\n#extractor.start.fetch.size=20000\\n#extractor.out.file.size=10485760\\nssl.truststore.path=C:/DMS Workshop/sctagent/truststore\\nssl.keystore.path=C:/DMS Workshop/sctagent/keystore\\nssl.keystore.pwd=ZcUnMLbWYe1c0Dp/m9jIe8IMGNtyUOv+AnKVsV9eZI+SuJcWI9Fd99Lskd7FFA==\\nssl.truststore.pwd=w23mFvm+SO46eR4dJ0Ly7n+XcvgUhhqrJNMPUyWdhFFKBxRU64/V2uioOul/aBcs8tMs=\\n```\n\nNow you configure the Amazon Web Services SCT extractor to perform a one-time data move. You can use multiple extractors when dealing with a large volume of data.\n\n2. To register the extractor, on the View menu, choose **Data migration view**.\n\n![image.png](https://dev-media.amazoncloud.cn/94572284820c4fb5961b19305eed0526_image.png)\n\n\n3. Choose **Register**.\n\n![image.png](https://dev-media.amazoncloud.cn/9cc22ecc289044668074be0bf175a90e_image.png)\n\n\n4. Enter the information for your new agent.\n5. Test the connection and choose **Register**.\n\n![image.png](https://dev-media.amazoncloud.cn/6312a833c5a246a2882d041e1a447504_image.png)**粗体**\n\n\nNow you create a task for the extractor to extract data into the tables created on [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail).\n\n6. Under your schema in the left pane, choose **Tables** (right-click) and choose **Create Local task**.\n\n![image.png](https://dev-media.amazoncloud.cn/bf8b0fa21f944faab9fd24f77b80442d_image.png)\n\n7 . For Task name, enter a name.\n\n![image.png](https://dev-media.amazoncloud.cn/8fdbfd99ae2a41d5af534f2303938675_image.png)\n\n8.Test the connection and choose **OK**.\n\n![image.png](https://dev-media.amazoncloud.cn/6caa818fde30456ba35285818ed4a131_image.png)\n\n\n9. Choose **Create**.\n\n10. Run your task and monitor its progress.\n\n![image.png](https://dev-media.amazoncloud.cn/cd5f0dc4b9714e3e95e086351705ab0f_image.png)\n\nYou can choose each task to get a detailed breakdown of its activity. Make sure to examine errors during the extract, upload, and copy process.\n\n![image.png](https://dev-media.amazoncloud.cn/020a843022044938836daed3b1a3333a_image.png)\n\nYou can monitor the status of the tasks, the percentage completed, and the tables that were loaded successfully. You must also verify the count of records loaded into the [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) database.\n\n![image.png](https://dev-media.amazoncloud.cn/97651b6060f84ffda388394347a8c220_image.png)\n\n#### **Technical validation**\n\n\nAfter the initial extracted data is loaded to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), you must perform data validation tests in parallel. The goal at this stage is to validate production workloads, comparing Greenplum and [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) outputs from the same inputs.\n\nTypical activities covered during this phase include the following:\n\n- Count the number of objects and rows on each table.\n- Compare the same random subset of data in both Greenplum and [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) for all migrated tables, validating that the data is exactly the same row by row.\n- Check for incorrect column encodings.\n- Identify skewed table data.\n- Annotate queries not benefiting from sort keys.\n- Identify inappropriate join cardinality.\n- Identify with tables with large VARCHAR columns.\n- Confirm that processes don’t crash when connected with the target environment.\n- Validate daily batch jobs (job duration, number of rows processed). To find the right techniques to perform most of those activities, refer to [Top 10 Performance Tuning Techniques for Amazon Redshift](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/)\n- Set up [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) automated alerts with [Amazon Redshift Advisor](https://docs.aws.amazon.com/redshift/latest/dg/advisor.html).\n\n\n#### **Business validation**\n\n\nAfter you successfully migrate the data and validate the data movement, the last remaining task is to involve the data warehouse users in the validation process. These users from different business units across the company access the data warehouse using various tools and methods: JDBC/ODBC clients, Python scripts, custom applications, and more. It’s central to the migration to make sure that every end-user has verified and adapted this process to work seamlessly with [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) before the final cutover.\n\nThis phase can consist of several tasks:\n\n- Adapt business users’ tools, applications, and scripts to connect to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) endpoints.\n- Modify users’ data load and dump procedures, replacing data movement to and from shared storage via ODBC/JDBC with COPY and UNLOAD operations from and to [Amazon Simple Storage Service](https://aws.amazon.com/cn/s3/?trk=cndc-detail) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)).\n- Modify any incompatible queries, taking into account any [implementation nuances between Amazon Redshift and PostgreSQL](https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html).\n- Run business processes against Greenplum and [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), and compare results and runtimes. Make sure to notify any issue or unexpected result to the team in charge of the migration, so the case can be analyzed in detail.\n- Tune query performance, taking into account table distribution and sort keys, and make extensive use of the [EXPLAIN](https://docs.aws.amazon.com/redshift/latest/dg/r_EXPLAIN.html) command in order to understand how [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) plans and runs queries. For advanced table design concepts, refer to [Amazon Redshift Engineering’s Advanced Table Design Playbook: Preamble, Prerequisites, and Prioritization](https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-preamble-prerequisites-and-prioritization/).\n\nThis business validation phase is key so all end-users are aligned and ready for the final cutover. Following [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) [best practices](https://docs.aws.amazon.com/redshift/latest/dg/best-practices.html) enables end-users to fully take advantage of the capabilities of their new data warehouse. After you perform all the migration validation tasks, connect and test every ETL job, business process, external system, and user tool against [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), you can disconnect every process from the old data warehouse, which you can now safely power off and decommission.\n\n\n#### **Conclusion**\n\n\nIn this post, we provided detailed steps to migrate from Greenplum to [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) using Amazon Web Services SCT. Although this post describes modernizing and moving to a cloud warehouse, you should be augmenting this transformation process towards a full-fledged modern data architecture. The Amazon Web Services Cloud enables you to be more data-driven by supporting multiple use cases. For a modern data architecture, you should use purposeful data stores like [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail), [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), [Amazon Timestream](https://aws.amazon.com/cn/timestream/?trk=cndc-detail), and other data stores based on your use case.\n\nCheck out the second post in this series, where we cover prescriptive guidance around data types, functions, and stored procedures.\n\n##### **About the Authors**\n\n![image.png](https://dev-media.amazoncloud.cn/0ae6b4c522104659b01fbe5a5726f076_image.png)\n\n**Suresh Patnam** is a Principal Solutions Architect at Amazon Web Services. He is passionate about helping businesses of all sizes transforming into fast-moving digital organizations focusing on big data, data lakes, and AI/ML. Suresh holds a MBA degree from Duke University- Fuqua School of Business and MS in CIS from Missouri State University. In his spare time, Suresh enjoys playing tennis and spending time with his family.\n\n![image.png](https://dev-media.amazoncloud.cn/7bb45d07126f4847ad3c00738f592d8a_image.png)\n\n**Arunabha Datta** is a Sr. Data Architect at Amazon Web Services (Amazon Web Services). He collaborates with customers and partners to architect and implement modern data architecture using Amazon Web Services Analytics services. In his spare time, Arunabha enjoys photography and spending time with his family.","render":"<p>A <a href=\\"https://en.wikipedia.org/wiki/Data_warehouse\\" target=\\"_blank\\">data warehouse</a> collects and consolidates data from various sources within your organization. It’s used as a centralized data repository for analytics and business intelligence.</p>\\n<p>When working with on-premises legacy data warehouses, scaling the size of your data warehouse or improving performance can mean purchasing new hardware or adding more powerful hardware. This is often expensive and time-consuming. Running your own on-premises data warehouse also requires hiring database managers, administrators to deal with outages, upgrades, and data access requests. As companies become more data-driven, reliable access to centralized data is increasingly important. As a result, there is a strong demand for data warehouses that are fast, accessible, and able to scale elastically with business needs. Cloud data warehouses like <a href=\\"https://aws.amazon.com/redshift/\\" target=\\"_blank\\">Amazon Redshift</a> address these needs while eliminating the cost and risk of purchasing new hardware.</p>\\n<p>This multi-part series explains how to migrate an on-premises Greenplum data warehouse to Amazon Redshift using <a href=\\"https://aws.amazon.com/dms/schema-conversion-tool/\\" target=\\"_blank\\">Amazon Web Services Schema Conversion Tool</a> (Amazon Web Services SCT). In this first post, we describe how to plan, run, and validate the large-scale data warehouse migration. It covers the solution overview, migration assessment, and guidance on technical and business validation. In the second post, we share best practices for choosing the optimal [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) cluster, data architecture, converting <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/stored-procedure-create.html\\" target=\\"_blank\\">stored procedures</a>, compatible functions and queries widely used for SQL conversions, and recommendations for optimizing the length of data types for table columns.</p>\\n<h4><a id=\\"Solution_overview_7\\"></a><strong>Solution overview</strong></h4>\\n<p><a href=\\"https://aws.amazon.com/redshift/\\" target=\\"_blank\\">Amazon Redshift</a> is an industry-leading cloud data warehouse. [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) uses Structured Query Language (<a href=\\"https://en.wikipedia.org/wiki/SQL\\" target=\\"_blank\\">SQL</a>) to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes using Amazon Web Services-designed hardware and machine learning to deliver the best price-performance at any scale.</p>\\n<p>Amazon Web Services SCT makes heterogeneous database migrations predictable by automatically converting the source database schema and most of the database code objects, SQL scripts, views, stored procedures, and functions to a format compatible with the target database. Amazon Web Services SCT helps you modernize your applications simultaneously during database migration. When schema conversion is complete, Amazon Web Services SCT can help migrate data from various data warehouses to Amazon Redshift using data extraction <a href=\\"https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/agents.html\\" target=\\"_blank\\">agents</a>.</p>\\n<p>The following diagram illustrates our architecture for migrating data from Greenplum to Amazon Redshift using Amazon Web Services SCT data extraction agents.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e948c862f1774faaa75f0c4f7f46110d_image.png\\" alt=\\"image.png\\" /></p>\n<h4><a id=\\"Perform_a_migration_assessment_19\\"></a><strong>Perform a migration assessment</strong></h4>\\n<p>The initial data migration is the first milestone of the project. The main requirements for this phase are to minimize the impact on the data source and transfer the data as fast as possible. To do this, Amazon Web Services offers several options, depending on the size of the database, network performance (<a href=\\"https://aws.amazon.com/directconnect/\\" target=\\"_blank\\">Amazon Web Services Direct Connect</a> or <a href=\\"https://aws.amazon.com/snowball/\\" target=\\"_blank\\">Amazon Web Services Snowball</a>), and whether the migration is heterogeneous or not (Amazon Web Services Database Migration Service (Amazon Web Services DMS) or Amazon Web Services SCT).</p>\\n<p>Amazon Web Services provides a portfolio of <a href=\\"https://aws.amazon.com/cloud-data-migration/\\" target=\\"_blank\\">cloud data migration services</a> to provide the right solution for any data migration project. The level of connectivity is a significant factor in data migration, and Amazon Web Services has offerings that can address your hybrid cloud storage, online data transfer, and offline data transfer needs.</p>\\n<p>Additionally, the <a href=\\"https://aws.amazon.com/snow/\\" target=\\"_blank\\">Amazon Web Services Snow Family</a> makes it simple to get your data into and out of Amazon Web Services via offline methods. Based on the size of the data, you can use <a href=\\"https://aws.amazon.com/snowmobile/\\" target=\\"_blank\\">Amazon Web Services Snowmobile</a> or <a href=\\"https://aws.amazon.com/snowball/\\" target=\\"_blank\\">Amazon Web Services Snowball</a> if you have petabytes to exabytes of data. To decide which transfer method is better for your use case, refer to <a href=\\"https://docs.aws.amazon.com/snowball/latest/ug/performance.html\\" target=\\"_blank\\">Performance for Amazon Web Services Snowball</a>.</p>\\n<h4><a id=\\"Perform_schema_conversion_with_Amazon_Web_Services__SCT_29\\"></a><strong>Perform schema conversion with Amazon Web Services SCT</strong></h4>\\n<p>To convert your schema using Amazon Web Services SCT, you must start a new Amazon Web Services SCT project and connect your databases. Complete the following steps:</p>\n<p>Install Amazon Web Services SCT.<br />\\nOpen and initiate a new project.<br />\\nFor <strong>Source database engine</strong>, choose <strong>Greenplum</strong>.<br />\\nFor <strong>Target database engine</strong>, choose <strong>Amazon Redshift</strong>.<br />\\nChoose <strong>OK</strong>.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/4e66f8f8615c4d4395880e90fa12f369_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"6\\">\\n<li>Open your project and choose Connect to Greenplum.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/a3a40d02585e481e884386b96731448d_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"7\\">\\n<li>Enter the Greenplum database information.</li>\n<li>Choose <strong>Test connection</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/6e1c3d17b74444c6bbba93fa4757ebfd_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"9\\">\\n<li>Choose OK after a successful connection test.</li>\n<li>Choose OK to complete the connection.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/88c0c91f261f40cd8f4bae429ad45090_image.png\\" alt=\\"image.png\\" /></p>\n<p>11.Repeat similar steps to establish a connection to your Amazon Redshift cluster.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/8380b29c487849809646eebaec1b973f_image.png\\" alt=\\"image.png\\" /></p>\n<p>By default, Amazon Web Services SCT uses Amazon Web Services Glue as the extract, transform, and load (ETL) solution for the migration. Before you continue, you must disable this setting.</p>\n<ol start=\\"12\\">\\n<li>On the Settings menu, choose Project settings.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/92de7f36e34646418deb42d7312c2bbc_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"13\\">\\n<li>Deselect <strong>Use Amazon Web Services Glue</strong>.</li>\\n<li>Choose <strong>OK</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/707e68732be34ea1a83f876f2a4b4e32_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"15\\">\\n<li>In the left pane, choose your schema (right-click) and choose <strong>Convert schema</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/e74c6b3bca944be298d15f787dab90d9_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"16\\">\\n<li>When asked to replace objects, choose Yes.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/5593d6e7d23d4642af76f202bd82dbbd_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"17\\">\\n<li>When asked to load statistics, choose Continue.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/673da57cba34426780ec88db3534e96b_image.png\\" alt=\\"image.png\\" /></p>\n<p>By the end of this step, all Greenplum objects should be migrated to Amazon Redshift syntax. Some objects may be shown in red, meaning that Amazon Web Services SCT couldn’t fully migrate these objects. You can view an assessment summary of the migration for more information.</p>\n<ol start=\\"18\\">\\n<li>On the View menu, choose Assessment report view.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/3605c224efbc41feaa0bd3c23c0f9d59_image.png\\" alt=\\"image.png\\" /></p>\n<p>In the bottom pane, you can see Greenplum DDL and Amazon Redshift DDL of the selected objects side by side for comparison.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/6ae03c20ca3f40dabe0c86321553f51e_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"19\\">\\n<li>Choose the schema with a red icon, which indicates that it needs manual conversion.You’re presented with specific actions regarding the tables, constraints, or views that can’t be migrated to Amazon Redshift. You must investigate these issues and fix the errors manually with the required changes. Some examples are binary data in BLOB format, which Amazon Web Services SCT automatically converts to character varying data type, but this may be highlighted as an issue. Additionally, some vendor-supplied procedures and functions couldn’t be converted, so Amazon Web Services SCT can error out.</li>\n</ol>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/d30927144d4944d0b306aa53a6c92e59_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"20\\">\\n<li>Connect using the <a href=\\"https://aws.amazon.com/blogs/aws/amazon-redshift-query-editor-v2-web-query-authoring/\\" target=\\"_blank\\">Amazon Redshift query editor v2</a> or another third-party tool or utility of your choice and check for all the tables with the following code:</li>\\n</ol>\n<pre><code class=\\"lang-\\">select t.table_schema, t.table_name,i.tbl_rows\\nfrom svv_tables t left join svv_table_info i \\non t.table_schema = i.schema and t.table_name = i.table\\nwhere t.table_type = 'BASE TABLE' and t.table_schema='SCHEMA NAME'\\norder by 1,2;\\n</code></pre>\\n<h4><a id=\\"Migrate_the_data_109\\"></a><strong>Migrate the data</strong></h4>\\n<p>To start your data migration using Amazon Web Services SCT data extraction agents, complete the following steps:</p>\n<ol>\\n<li>Configure the Amazon Web Services SCT extractor properties file with corresponding Greenplum properties:</li>\n</ol>\\n<pre><code class=\\"lang-\\">port=8192\\nvendor=GREENPLUM\\ndriver.jars=&quot;C:\\\\DMS Workshop\\\\JDBC\\\\postgresql-42.2.9.jar&quot;\\nredshift.driver.jars=C:\\\\DMS Workshop\\\\JDBC\\\\RedshiftJDBC42-no-awssdk-1.2.43.1067.jar\\nworking.folder=C:/Users/developer\\nssl.option=ON\\nssl.require.client.authentication=ON\\n#extractor.start.fetch.size=20000\\n#extractor.out.file.size=10485760\\nssl.truststore.path=C:/DMS Workshop/sctagent/truststore\\nssl.keystore.path=C:/DMS Workshop/sctagent/keystore\\nssl.keystore.pwd=ZcUnMLbWYe1c0Dp/m9jIe8IMGNtyUOv+AnKVsV9eZI+SuJcWI9Fd99Lskd7FFA==\\nssl.truststore.pwd=w23mFvm+SO46eR4dJ0Ly7n+XcvgUhhqrJNMPUyWdhFFKBxRU64/V2uioOul/aBcs8tMs=\\n</code></pre>\\n<p>Now you configure the Amazon Web Services SCT extractor to perform a one-time data move. You can use multiple extractors when dealing with a large volume of data.</p>\n<ol start=\\"2\\">\\n<li>To register the extractor, on the View menu, choose <strong>Data migration view</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/94572284820c4fb5961b19305eed0526_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"3\\">\\n<li>Choose <strong>Register</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/9cc22ecc289044668074be0bf175a90e_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"4\\">\\n<li>Enter the information for your new agent.</li>\n<li>Test the connection and choose <strong>Register</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/6312a833c5a246a2882d041e1a447504_image.png\\" alt=\\"image.png\\" /><strong>粗体</strong></p>\\n<p>Now you create a task for the extractor to extract data into the tables created on Amazon Redshift.</p>\n<ol start=\\"6\\">\\n<li>Under your schema in the left pane, choose <strong>Tables</strong> (right-click) and choose <strong>Create Local task</strong>.</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/bf8b0fa21f944faab9fd24f77b80442d_image.png\\" alt=\\"image.png\\" /></p>\n<p>7 . For Task name, enter a name.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/8fdbfd99ae2a41d5af534f2303938675_image.png\\" alt=\\"image.png\\" /></p>\n<p>8.Test the connection and choose <strong>OK</strong>.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/6caa818fde30456ba35285818ed4a131_image.png\\" alt=\\"image.png\\" /></p>\n<ol start=\\"9\\">\\n<li>\\n<p>Choose <strong>Create</strong>.</p>\\n</li>\n<li>\\n<p>Run your task and monitor its progress.</p>\n</li>\\n</ol>\n<p><img src=\\"https://dev-media.amazoncloud.cn/cd5f0dc4b9714e3e95e086351705ab0f_image.png\\" alt=\\"image.png\\" /></p>\n<p>You can choose each task to get a detailed breakdown of its activity. Make sure to examine errors during the extract, upload, and copy process.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/020a843022044938836daed3b1a3333a_image.png\\" alt=\\"image.png\\" /></p>\n<p>You can monitor the status of the tasks, the percentage completed, and the tables that were loaded successfully. You must also verify the count of records loaded into the Amazon Redshift database.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/97651b6060f84ffda388394347a8c220_image.png\\" alt=\\"image.png\\" /></p>\n<h4><a id=\\"Technical_validation_179\\"></a><strong>Technical validation</strong></h4>\\n<p>After the initial extracted data is loaded to Amazon Redshift, you must perform data validation tests in parallel. The goal at this stage is to validate production workloads, comparing Greenplum and Amazon Redshift outputs from the same inputs.</p>\n<p>Typical activities covered during this phase include the following:</p>\n<ul>\\n<li>Count the number of objects and rows on each table.</li>\n<li>Compare the same random subset of data in both Greenplum and Amazon Redshift for all migrated tables, validating that the data is exactly the same row by row.</li>\n<li>Check for incorrect column encodings.</li>\n<li>Identify skewed table data.</li>\n<li>Annotate queries not benefiting from sort keys.</li>\n<li>Identify inappropriate join cardinality.</li>\n<li>Identify with tables with large VARCHAR columns.</li>\n<li>Confirm that processes don’t crash when connected with the target environment.</li>\n<li>Validate daily batch jobs (job duration, number of rows processed). To find the right techniques to perform most of those activities, refer to <a href=\\"https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/\\" target=\\"_blank\\">Top 10 Performance Tuning Techniques for Amazon Redshift</a></li>\\n<li>Set up Amazon Redshift automated alerts with <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/advisor.html\\" target=\\"_blank\\">Amazon Redshift Advisor</a>.</li>\\n</ul>\n<h4><a id=\\"Business_validation_198\\"></a><strong>Business validation</strong></h4>\\n<p>After you successfully migrate the data and validate the data movement, the last remaining task is to involve the data warehouse users in the validation process. These users from different business units across the company access the data warehouse using various tools and methods: JDBC/ODBC clients, Python scripts, custom applications, and more. It’s central to the migration to make sure that every end-user has verified and adapted this process to work seamlessly with Amazon Redshift before the final cutover.</p>\n<p>This phase can consist of several tasks:</p>\n<ul>\\n<li>Adapt business users’ tools, applications, and scripts to connect to Amazon Redshift endpoints.</li>\n<li>Modify users’ data load and dump procedures, replacing data movement to and from shared storage via ODBC/JDBC with COPY and UNLOAD operations from and to Amazon Simple Storage Service (Amazon S3).</li>\n<li>Modify any incompatible queries, taking into account any <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html\\" target=\\"_blank\\">implementation nuances between Amazon Redshift and PostgreSQL</a>.</li>\\n<li>Run business processes against Greenplum and Amazon Redshift, and compare results and runtimes. Make sure to notify any issue or unexpected result to the team in charge of the migration, so the case can be analyzed in detail.</li>\n<li>Tune query performance, taking into account table distribution and sort keys, and make extensive use of the <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/r_EXPLAIN.html\\" target=\\"_blank\\">EXPLAIN</a> command in order to understand how [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail) plans and runs queries. For advanced table design concepts, refer to <a href=\\"https://aws.amazon.com/blogs/big-data/amazon-redshift-engineerings-advanced-table-design-playbook-preamble-prerequisites-and-prioritization/\\" target=\\"_blank\\">Amazon Redshift Engineering’s Advanced Table Design Playbook: Preamble, Prerequisites, and Prioritization</a>.</li>\\n</ul>\n<p>This business validation phase is key so all end-users are aligned and ready for the final cutover. Following Amazon Redshift <a href=\\"https://docs.aws.amazon.com/redshift/latest/dg/best-practices.html\\" target=\\"_blank\\">best practices</a> enables end-users to fully take advantage of the capabilities of their new data warehouse. After you perform all the migration validation tasks, connect and test every ETL job, business process, external system, and user tool against [Amazon Redshift](https://aws.amazon.com/cn/redshift/?trk=cndc-detail), you can disconnect every process from the old data warehouse, which you can now safely power off and decommission.</p>\\n<h4><a id=\\"Conclusion_214\\"></a><strong>Conclusion</strong></h4>\\n<p>In this post, we provided detailed steps to migrate from Greenplum to Amazon Redshift using Amazon Web Services SCT. Although this post describes modernizing and moving to a cloud warehouse, you should be augmenting this transformation process towards a full-fledged modern data architecture. The Amazon Web Services Cloud enables you to be more data-driven by supporting multiple use cases. For a modern data architecture, you should use purposeful data stores like Amazon S3, Amazon Redshift, Amazon Timestream, and other data stores based on your use case.</p>\n<p>Check out the second post in this series, where we cover prescriptive guidance around data types, functions, and stored procedures.</p>\n<h5><a id=\\"About_the_Authors_221\\"></a><strong>About the Authors</strong></h5>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/0ae6b4c522104659b01fbe5a5726f076_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Suresh Patnam</strong> is a Principal Solutions Architect at Amazon Web Services. He is passionate about helping businesses of all sizes transforming into fast-moving digital organizations focusing on big data, data lakes, and AI/ML. Suresh holds a MBA degree from Duke University- Fuqua School of Business and MS in CIS from Missouri State University. In his spare time, Suresh enjoys playing tennis and spending time with his family.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/7bb45d07126f4847ad3c00738f592d8a_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Arunabha Datta</strong> is a Sr. Data Architect at Amazon Web Services (Amazon Web Services). He collaborates with customers and partners to architect and implement modern data architecture using Amazon Web Services Analytics services. In his spare time, Arunabha enjoys photography and spending time with his family.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭