Archive and Purge Data for Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility using pg_partman and Amazon S3

海外精选
海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时,内容中提到的“AWS” 是 “Amazon Web Services” 的缩写,在此网站不作为商标展示。
0
0
{"value":"The structured data industry has been experiencing exponential data growth year by year. It’s important for the relational database users to get smart with its data storage footprint by understanding data relevance while keeping its business needs in mind.\n\nData archive is the critical part of data management, as it enables us to keep business relevant hot data in high-cost, high performance storage, as well as move less significant cold data into cheaper storage. The archival practice helps reduce Total Cost of Ownership (TCO) and improve database performance as smaller datasets result in faster access time.\n\nTraditionally, database archival practice was to move the infrequently accessed data to lower-cost storage like offsite tape/disk to keep the data for a long period and for compliance purposes.\n\nOver the last decades, with the rise of analytic and machine learning (ML), the need to access the archived data has become a necessity for enterprise data mining and decision support. These needs are much more easily met by [Amazon Simple Storage Service](https://aws.amazon.com/pm/serv-s3/?trk=ps_a134p000004f2aOAAQ&trkCampaign=acq_paid_search_brand&sc_channel=PS&sc_campaign=acquisition_US&sc_publisher=Google&sc_category=Storage&sc_country=US&sc_geo=NAMER&sc_outcome=acq&sc_detail=amazon%20s3&sc_content=S3_e&sc_matchtype=e&sc_segment=488982706722&sc_medium=ACQ-P%7CPS-GO%7CBrand%7CDesktop%7CSU%7CStorage%7CS3%7CUS%7CEN%7CText&s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3&ef_id=EAIaIQobChMI9LuWzLTi9QIVWMmUCR2fBAGlEAAYASAAEgLtQvD_BwE:G:s&s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)) as the optimized archive storage solution when compared to the offsite tape and disk. [Amazon S3](https://aws.amazon.com/pm/serv-s3/?trk=ps_a134p000004f2aOAAQ&trkCampaign=acq_paid_search_brand&sc_channel=PS&sc_campaign=acquisition_US&sc_publisher=Google&sc_category=Storage&sc_country=US&sc_geo=NAMER&sc_outcome=acq&sc_detail=amazon%20s3&sc_content=S3_e&sc_matchtype=e&sc_segment=488982706722&sc_medium=ACQ-P%7CPS-GO%7CBrand%7CDesktop%7CSU%7CStorage%7CS3%7CUS%7CEN%7CText&s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3&ef_id=EAIaIQobChMI9LuWzLTi9QIVWMmUCR2fBAGlEAAYASAAEgLtQvD_BwE:G:s&s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3) is scalable and secure, offering high performance, high availability, and durability that can help you archive and run analytics at a lower cost.\n\nIn some cases, data in a relational database is relevant for a period. A common way to handle such data is to partition and archive the tables based on time. The PostgreSQL extension pg_partman is a popular data partitioning automation extension available in [Amazon Relational Database Service (Amazon RDS) for PostgreSQL](https://aws.amazon.com/rds/postgresql/) or [Amazon Aurora PostgreSQL-Compatible Edition](https://aws.amazon.com/rds/aurora/postgresql-features/).\n\nIn this post, we show how you can efficiently use PostgreSQL’s native range partition to partition current (hot) data with pg_partman and archive historical (cold) data in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail). The solution works with [Amazon RDS](https://aws.amazon.com/cn/rds/?trk=cndc-detail) for PostgreSQL version 12.5 onward, and [Amazon Aurora](https://aws.amazon.com/cn/rds/aurora/?trk=cndc-detail) PostgreSQL version 12.6 onward.\n\n\n#### **Solution overview**\n\n\nThe solution combines PostgreSQL’s native range partitioning feature with pg_partman and [Amazon RDS](https://aws.amazon.com/cn/rds/?trk=cndc-detail)’s [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) export/import functions.\n\nPostgreSQL lets you divide a table into partitions based on key columns’ date/time ranges. It offers great performance and management benefits for archiving/purging historical data. Instead of bulk insert and delete, you simply copy the partition out for archive, then drop the partition when you no longer need it.\n\npg_partman is a PostgreSQL extension that supports PostgreSQL’s native partitioning to create and manage time-based and serial-based partition sets. It automates the child partition creation and works with your retention policy to detach or drop the obsolete partitions for you.\n\nWhen a partition becomes obsolete, you call **aws_s3.query_export_to_s3** function to upload the data to a designated [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) folder. Once confirmed, the partition is dropped from its database. The archived data stays in the S3 bucket for its lifetime, going through different [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) storage tiers according to the Amazon [S3 lifecycle management policy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html).You can [query](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference.html) data in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) directly, so restoring the archived data to the database becomes optional.\n\nFigure 1 shows data movement between the database and the [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) archive:\n\n![image.png](https://dev-media.amazoncloud.cn/b5c10f2abf3547abb143a8d673b6ae2c_image.png)\n\nFigure 1 – Data movement between the database and [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)\n\nFigure 2 shows the destination S3 bucket’s structure for a monthly partition table:\n\n![image.png](https://dev-media.amazoncloud.cn/926087196e184c1ca6fe521598e6d213_image.png)\n\nFigure 2 – Diagram representing the bucket’s folder organization\n\nThe solution involves the following specific steps:\n\n- Set up S3 bucket policies\n- Set up your database\n- Partition the candidate table\n- Upload the obsolete table partition to [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) and drop the archived partition from the database\n- (Optional) Restore the archived data from [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) to the database\n\n\n\n#### **Prerequisites**\n\n\n\nTo implement the solution, you must have the following components ready.\n\n- A data retention policy that defines what data should be archived, where, and for how long it should be kept.\n- An [Amazon Web Services Managed Key Service (Amazon Web Services KMS)](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#kms_keys) customer key.\n- A private S3 bucket as the archive destination:\n\t- Encrypted with your Amazon Web Services KMS customer key\n\t- With a [lifecycle policy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html) configured\n- An RDS for PostgreSQL instance (version 12.5 and above) or Aurora PostgreSQL cluster (version 12.6 or above) with the pg_partman extension [installed](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_Partitions.html).\n- A privileged IAM user to manage IAM policies, to run [Amazon Command Line Interface (Amazon Web Services CLI)](https://aws.amazon.com/cli/) commands and to access the [Amazon Web Services Management Console](https://aws.amazon.com/console/).\n- A scheduler like pg_cron to run partition maintenance jobs.\n- A database user who runs archive procedures. If you prefer, you may create a separate database user to perform archive operations on behalf of all the schema/table owners. In such case, you must grant the user select privilege on the target tables and their partitions:\n \t\n\t\n```\\nGRANT CONNECT ON DATABASE <dbname> TO <archive_user>;\\n```\n\n\n```\\nGRANT USAGE ON SCHEMA <schemaname> TO <archive_user>;\\n```\n\n\n```\\nGRANT SELECT ON <tablename> TO <archive_user>;\\n```\n\n\n```\\nGRANT SELECT ON <tablepartitionname> TO <archive_user>;\\n```\n\n\n#### **Set up S3 bucket Policies**\n\n\nFirst, let’s set up the S3 bucket. Use the Amazon Web Services CLI to run the following commands:\n\n1. Create a policy with the necessary privileges to access the S3 bucket and its Amazon Web Services KMS key. Take note of the policy ARN.\n```\\n aws iam create-policy --policy-name pg-s3-policy --policy-document file://s3-exp-policy\\n```\nIn the preceding code, s3-exp-policy is a file in the current directory with the following content:\n\n```\\n{\\n \\"Version\\": \\"2012-10-17\\",\\n \\"Statement\\": [\\n {\\n \\"Effect\\": \\"Allow\\",\\n \\"Action\\": [\\n \\"s3:GetObject\\",\\n \\"s3:AbortMultipartUpload\\",\\n \\"s3:DeleteObject\\",\\n \\"s3:ListMultipartUploadParts\\",\\n \\"s3:PutObject\\",\\n \\"s3:ListBucket\\"\\n ],\\n \\"Resource\\": [\\n \\"arn:aws:s3:::<bucketname>/<databasename>*\\",\\n \\"arn:aws:s3:::<bucketname>\\"\\n ]\\n },\\n\\t\\t{\\n \\"Effect\\": \\"Allow\\",\\n \\"Action\\": [\\n \\"kms:Decrypt\\",\\n \\"kms:DescribeKey\\",\\n\\t\\"kms:GenerateDataKey\\",\\n\\t\\"kms:Encrypt\\",\\n \\"kms:ReEncrypt*\\"\\n ],\\n \\"Resource\\": \\"<yourKMSkeyarn>\\"\\n }\\t\\n ]\\n}\\n```\n\n2. Create two [Amazon Web Services Identity and Access Management (IAM)](https://aws.amazon.com/iam/) roles: one for export and one for import.\n\n```\\naws iam create-role --role-name pg-s3-export-role --assume-role-policy-document file://s3-exp-assumerole-policy\\n\\naws iam create-role --role-name pg-s3-import-role --assume-role-policy-document file://s3-exp-assumerole-policy\\n```\n\nIn the preceding code, s3-exp-assumerole-policy is saved in the current directory with the following contents:\n\n```\\n{\\n\\"Version\\": \\"2012-10-17\\",\\"Statement\\": [\\n {\\n \\"Effect\\": \\"Allow\\",\\"Principal\\": {\\n \\"Service\\": \\"rds.amazonaws.com\\"\\n },\\"Action\\": \\"sts:AssumeRole\\"\\n }\\n ]\\n}\\n```\n\n3.Attach the policy created in Step 1 to the roles created from Step 2.\n\n```\\naws iam attach-role-policy --policy-arn <yourpolicyarn> --role-name pg-s3-export-role\\naws iam attach-role-policy --policy-arn <yourpolicyarn> --role-name pg-s3-import-role\\n```\n\n4.Add the roles to [your Amazon Aurora](https://aws.amazon.com/rds/aurora/) cluster or RDS instance through the Amazon Web Services Management Console.\n\nSelect the Aurora cluster or RDS instance from the Connectivity & Security tab, fill in the information, and select Add role. You enter the pg-s3-exprot-role and the pg-s3-import-role one by one as shown in Figure 3 and Figure 4.\n\n![image.png](https://dev-media.amazoncloud.cn/65705d1664d84b84b2f83a867555adcb_image.png)\n\nFigure 3 – Add S3 export role to Aurora cluster\n\n![image.png](https://dev-media.amazoncloud.cn/0fb576892b2449bfacb078db460392e8_image.png)\n\nFigure 4 – Add S3 import role to Aurora Cluster\n\n\n#### **Set up your Database**\n\nTo set up your database, you [install](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_Partitions.html) the required PostgreSQL schema and extensions. From a PostgreSQL client like psql, complete the following steps as rds_superuser:\n\n\t1.Create partman schema and pg_partman extension:\n```\\n\\tCREATE SCHEMA partman;\\n\\n\\tCREATE EXTENSION pg_partman WITH SCHEMA partman;\\n```\n```\\n\\tpartman schema has the required tables and functions to help you configure and manage the table partitions.\\n```\n2.Create aws_s3 extension. These functions enable data moving between the database and [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail). Grant permissions to the database user whoever does the archive.\n```\\n\\tCREATE EXTENSION aws_s3 CASCADE;\\n\\tGRANT USAGE ON SCHEMA aws_s3 TO <archive_user>;\\n\\tGRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA aws_s3 TO <archive_user>;\\n```\n\n#### **Partition the candidate table**\n\n\nExamine the candidate table to determine the partition key based on your purge criteria. Learn how data is distributed to choose a proper partition interval and partition starting point.\n\nAs an example, we created a PostgreSQL table called ticket_purchase_hist from aws-database-migration-samples provided through [this link](https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1).The README.md file has the details on how to create dms_sample schema and how to populate it with data. For performance reasons, we replaced the double precision data type with int.\n\nThe following is the DDL for ticket_purchase_hist:\n\n```\\nCREATE TABLE dms_sample.ticket_purchase_hist (\\n\\tsporting_event_ticket_id int NOT NULL,\\n\\tpurchased_by_id int NOT NULL,\\n\\ttransaction_date_time timestamp(0) NOT NULL,\\n\\ttransferred_from_id int NULL,\\n\\tpurchase_price numeric(8, 2) NOT NULL,\\n\\tCONSTRAINT ticket_purchase_hist_pk PRIMARY KEY (sporting_event_ticket_id, purchased_by_id, transaction_date_time)\\n);\\n\\n```\n\nBy design, with each “sell” or “transfer” of a ticket, a record is inserted into this table with the current timestamp. There are a total of 5,742,278 records in the table. Figure 5 shows data distribution by month:\n\n![image.png](https://dev-media.amazoncloud.cn/969073c2e250420d8ddd28d8c2032cd1_image.png)\n\nFigure 5 – Table data distribution by month\n\nAssume the retention policy is 12 months. The optimal partition strategy is to partition this table on the transaction_date_time column with a monthly interval.\n\n**Note**: The procedures in this section require application downtime.\n\nUsing the psql tool, follow these steps to partition the table:\n\n\t1.Create a new monthly range partition table from the original table. Choose transaction_date_time as the partition key.\n\n```\\nCREATE TABLE dms_sample.ticket_purchase_hist_temp (\\nsporting_event_ticket_id int NOT NULL,\\npurchased_by_id int NOT NULL,\\ntransaction_date_time timestamp(0) NOT NULL,\\ntransferred_from_id int NULL,\\npurchase_price numeric(8, 2) NOT NULL)\\nPARTITION BY RANGE(transaction_date_time);\\n```\n\t2.Rename the tables so that the new table becomes the working table:\n\n```\\nALTER TABLE dms_sample.ticket_purchase_hist RENAME TO ticket_purchase_hist_old;\\nALTER TABLE dms_sample.ticket_purchase_hist_temp RENAME TO ticket_purchase_hist;\\n\\n```\n\n\t3.As rds_superuser, register the parent table with pg_partman and create the partitions. Since the earliest transaction date in this table is January 10, 2021, we set p_start_partition=>’2021-01-01’. Your transaction date may be different, so make sure that you query your table and set p_start_partition accordingly.\n\n```\\nSELECT partman.create_parent( p_parent_table => 'dms_sample.ticket_purchase_hist', p_control => 'transaction_date_time', p_type => 'native', p_interval=> 'monthly', p_premake => 7,p_start_partition => '2021-01-01');\\n```\n\n\t4.Drop the default partition to avoid possible future partition creation issues. The default partition is to store records that can’t be mapped to any other partitions. In case a record is inserted into the default partition, creating a partition using that record value range will fail.\n\n```\\nDROP TABLE dms_sample.ticket_purchase_hist_default;\\n```\n\n\t5.Migrate data into the partitioned table.\n\n```\\nINSERT INTO dms_sample.ticket_purchase_hist SELECT * FROM dms_sample.ticket_purchase_hist_old;\\n\\n```\n\n\t6.Verify that the data was successfully inserted into each partition. For example:\n\n```\\nSELECT count(*) FROM dms_sample. ticket_purchase_hist_p2021_01;\\nOutput: 64946\\n\\n```\n\n\t7.Add keys and indexes to the partitioned table if they are part of the original table.\n\n```\\nALTER TABLE dms_sample.ticket_purchase_hist ADD CONSTRAINT ticket_purchase_hist_pk1 PRIMARY KEY (sporting_event_ticket_id, purchased_by_id, transaction_date_time);\\n\\n```\n\n\n\t8.As rds_superuser, set retention policy with pg_partman.\nThis example uses 12 months. Your environment may have different requirements.\n\n```\\nUPDATE partman.part_config SET infinite_time_partitions = true, retention = '12 months', retention_keep_table=true WHERE parent_table = 'dms_sample.ticket_purchase_hist';\\n\\n```\n\n\t9.As rds_superuser, from the psql command line, run the following query periodically (monthly in this example).\n\n```\\nCALL partman.run_maintenance_proc();\\n```\n\nThe procedure creates the future partitions ahead of time and detach the obsolete partitions according to the retention that we set at Step 8. We recommend that you run this regularly from a job scheduler. Refer to the RDS documentation to learn more.\n\n\n#### **Upload obsolete table partition to [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) and drop the archived partition from the database**\n\n\nA partition is maintained as a child table to its parent by pg_partman. When time goes by, a partition will become obsolete as evaluated against the retention policy. Partman.run_maintenance_proc() detects and detaches these partitions from its parent table automatically. Your task is to identify the detached partitions and ship them to the designated S3 bucket for archiving.\n\n1.From the psql command line, run the following query to identify the partitions that are detached from the parent:\n\n```\\nselect relname, n.nspname\\nfrom\\npg_class \\njoin pg_namespace n on n.oid = relnamespace\\nwhere relkind = 'r' and relispartition ='f'\\nand relname like ‘ticket_purchase_hist_p%' and n.nspname = 'dms_sample';\\n\\noutput:\\nrelname \\t|nspname |\\nticket_purchase_hist_p2021_01|dms_sample|\\n```\n\n2.Export the detached table data to the designated folder in the S3 bucket.\nIn this step, you run the psql command to archive the data to the S3 bucket using two PostgreSQL functions:\n- create_s3_uri defines the target S3 bucket information. In the example, you enter the bucket name as ‘dbarchive-test‘, path as ‘testdb/dms_sample/ticket_purchase_hist/202101/data’, and Amazon Web Services regions as ‘us-east-1‘.\n- query_export_to_s3 function extracts and exports data based on a query.\n In the following example, the query exports the data for the entire partition:\n\n```\\nSELECT *\\nFROM aws_s3.query_export_to_s3(\\n'SELECT * FROM dms_sample.ticket_purchase_hist_p2021_01',\\naws_commons.create_s3_uri(\\n'dbarchive-test',\\n'testdb/dms_sample/ticket_purchase_hist/202101/data','us-east-1'));\\n```\n\n3.Validate by comparing the total records in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) and records in the database table.\nFrom Amazon Web Services CLI, run the following [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) Select query to count records for the archived table in the S3 bucket. In the example, the number of records is written to an output file called “recordcount.csv”.\n\n```\\naws s3api select-object-content \\\\\\n --bucket dbarchive-test \\\\\\n --key testdb/dms_sample/ticket_purchase_hist/202101/data \\\\\\n --expression \\"select count(*) from s3object\\" \\\\\\n --expression-type 'SQL' \\\\\\n --input-serialization '{\\"CSV\\": {}, \\"CompressionType\\": \\"NONE\\"}' \\\\\\n--output-serialization '{\\"CSV\\": {}}' \\"recordcount.csv\\"\\nYou should see 64946 in recordcount.csv file, which matches the table record count.\\n```\n\nUpload the archived table DDL to [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail).\nBecause the table structure may change over time, it’s a good idea to add the archived table DDL to the [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) folder as well. In the following example, with the PostgreSQL **pg_dump** command, you first save the create table statement as a file named “ticket_purchase_hist_p2021_01.sql”. Then, you upload it to the S3 bucket.From the machine with pg_dump installed, run the command to extract and save the create table DDL.\n\n```\\npg_dump -h <dbhost> -d <dbname> -U <dbuser> -s -t dms_sample.ticket_purchase_hist_p2021_01 > ticket_purchase_hist_p2021_01.sql\\n\\n```\n\nWith Amazon Web Services CLI, upload the ticket_purchase_hist_p2021_01.sql to the DDL folder of the S3 bucket.\n\n```\\naws s3 cp ticket_purchase_hist_p2021_01.sql s3://dbarchive-test/testdb/dms_sample/ticket_purchase_hist/202101/ddl/ticket_purchase_hist_p2021_01.sql\\n```\n\n5.Drop the archived table from the database using psql.\n\n```\\nDrop table dms_sample.ticket_purchase_hist_p2021_01 cascade;\\n\\nNote: You need to be the table owner to drop the table.\\n\\n```\n\n#### **(Optional) Restore archived data from [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) to the database**\n\nYou can access the data in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) using tools like [Amazon S3 Select](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-select.html) ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) [Glacier Select](https://docs.aws.amazon.com/amazonglacier/latest/dev/glacier-select.html)) or [Amazon Athena](https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc).\nOnly by special requirement, you need to restore the archived data to a database. You can achieve this in two steps:\n\n1.Restore the table ddl from [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail).\nFrom Amazon Web Services CLI, download the ddl from the S3 bucket.\nFor example:\n\n```\\naws s3 cp s3://dbarchive-test/testdb/dms_sample/ticket_purchase_hist/202101/ddl/ticket_purchase_hist_p2021_01.sql ticket_purchase_hist_p2021_01.sql\\n```\n\nFrom psql, create the table using the above file.\nFor example:\n\n```\\npsql \\"dbname=<dbname> host=<dbhost> user=<dbuser> port=5432\\" -f ticket_purchase_hist_p2021_01.sql\\n```\n\n2.Restore the archived data by calling aws_s3.table_import_from_s3 and aws_commons.create_s3_uri functions.\n- \taws_commons.create_s3_uri defines the source S3 bucket information. In the example, you enter the bucket name as ‘dbarchive-test‘, path as ‘testdb/dms_sample/ticket_purchase_hist/202101/data’, and Amazon Web Services regions as ‘us-east-1‘.\n- \taws_s3.table_import_from_s3 imports data from the S3 bucket to the target table. In the example, you enter ‘dms_sample.ticket_purchase_hist_p2021_01’ as target table name, and ‘text’ for format.\n\nExample code for psql:\n\n```\\nSELECT *\\nFROM aws_s3.table_import_from_s3(\\n'dms_sample.ticket_purchase_hist_p2021_01',\\n'', \\n'(format text)',\\naws_commons.create_s3_uri(\\n'dbarchive-test',\\n'testdb/dms_sample/ticket_purchase_hist/202101/data','us-east-1'))\\n```\n\nOutput:\n\n```\\n64946 rows imported into relation \\"dms_sample.ticket_purchase_hist_p2021_01 \\" from file testdb/dms_sample/ticket_purchase_hist/202101/data of 3003108 bytes\\n```\n\n\n#### **Summary**\n\n\nIn this post, we’ve showed you an efficient PostgreSQL database archive solution with table partitioning using pg_partman and [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) functions. You can adapt the same solution to multiple tables in your own schema. Although we run the commands manually, you can automate the process by scripting them together and using [Amazon Web Services Lambda](https://aws.amazon.com/lambda/) to schedule it to run.\n\nShare your experience in the Comments section. We highly value your feedback!\n\n#### **About the Author**\n\n\n![image.png](https://dev-media.amazoncloud.cn/26f54246846842dca3ff4e4b9e10a17c_image.png)\n\n**Li Liu** is a Senior Database Specialty Architect with the Professional Services team at Amazon Web Services. She helps customers to migrate traditional on-premise databases to Amazon Web Services cloud. She specializes in database design, architecture and performance tuning.\n\n![image.png](https://dev-media.amazoncloud.cn/a59dbb1b4ad34680b9767177c79a06b1_image.png)\n\n**Shunan Xiang** is a Senior Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist to help Amazon customers in design and implementing scalable, secure, performant, and robust database solutions on the cloud.\n\n![image.png](https://dev-media.amazoncloud.cn/865011b7ed7b4180998e7218f2342281_image.png)\n\n**Samujjwal Roy** is a Principal DB Consultant with the Professional Services team at Amazon Web Services. He has been with Amazon for 17+ years and has led migration projects for internal and external Amazon customers to move their on-premises database environment to Amazon Web Services Cloud database solutions.","render":"<p>The structured data industry has been experiencing exponential data growth year by year. It’s important for the relational database users to get smart with its data storage footprint by understanding data relevance while keeping its business needs in mind.</p>\n<p>Data archive is the critical part of data management, as it enables us to keep business relevant hot data in high-cost, high performance storage, as well as move less significant cold data into cheaper storage. The archival practice helps reduce Total Cost of Ownership (TCO) and improve database performance as smaller datasets result in faster access time.</p>\n<p>Traditionally, database archival practice was to move the infrequently accessed data to lower-cost storage like offsite tape/disk to keep the data for a long period and for compliance purposes.</p>\n<p>Over the last decades, with the rise of analytic and machine learning (ML), the need to access the archived data has become a necessity for enterprise data mining and decision support. These needs are much more easily met by <a href=\\"https://aws.amazon.com/pm/serv-s3/?trk=ps_a134p000004f2aOAAQ&amp;trkCampaign=acq_paid_search_brand&amp;sc_channel=PS&amp;sc_campaign=acquisition_US&amp;sc_publisher=Google&amp;sc_category=Storage&amp;sc_country=US&amp;sc_geo=NAMER&amp;sc_outcome=acq&amp;sc_detail=amazon%20s3&amp;sc_content=S3_e&amp;sc_matchtype=e&amp;sc_segment=488982706722&amp;sc_medium=ACQ-P%7CPS-GO%7CBrand%7CDesktop%7CSU%7CStorage%7CS3%7CUS%7CEN%7CText&amp;s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3&amp;ef_id=EAIaIQobChMI9LuWzLTi9QIVWMmUCR2fBAGlEAAYASAAEgLtQvD_BwE:G:s&amp;s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3\\" target=\\"_blank\\">Amazon Simple Storage Service</a> ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail)) as the optimized archive storage solution when compared to the offsite tape and disk. <a href=\\"https://aws.amazon.com/pm/serv-s3/?trk=ps_a134p000004f2aOAAQ&amp;trkCampaign=acq_paid_search_brand&amp;sc_channel=PS&amp;sc_campaign=acquisition_US&amp;sc_publisher=Google&amp;sc_category=Storage&amp;sc_country=US&amp;sc_geo=NAMER&amp;sc_outcome=acq&amp;sc_detail=amazon%20s3&amp;sc_content=S3_e&amp;sc_matchtype=e&amp;sc_segment=488982706722&amp;sc_medium=ACQ-P%7CPS-GO%7CBrand%7CDesktop%7CSU%7CStorage%7CS3%7CUS%7CEN%7CText&amp;s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3&amp;ef_id=EAIaIQobChMI9LuWzLTi9QIVWMmUCR2fBAGlEAAYASAAEgLtQvD_BwE:G:s&amp;s_kwcid=AL!4422!3!488982706722!e!!g!!amazon%20s3\\" target=\\"_blank\\">Amazon S3</a> is scalable and secure, offering high performance, high availability, and durability that can help you archive and run analytics at a lower cost.</p>\\n<p>In some cases, data in a relational database is relevant for a period. A common way to handle such data is to partition and archive the tables based on time. The PostgreSQL extension pg_partman is a popular data partitioning automation extension available in <a href=\\"https://aws.amazon.com/rds/postgresql/\\" target=\\"_blank\\">Amazon Relational Database Service (Amazon RDS) for PostgreSQL</a> or <a href=\\"https://aws.amazon.com/rds/aurora/postgresql-features/\\" target=\\"_blank\\">Amazon Aurora PostgreSQL-Compatible Edition</a>.</p>\\n<p>In this post, we show how you can efficiently use PostgreSQL’s native range partition to partition current (hot) data with pg_partman and archive historical (cold) data in Amazon S3. The solution works with Amazon RDS for PostgreSQL version 12.5 onward, and Amazon Aurora PostgreSQL version 12.6 onward.</p>\n<h4><a id=\\"Solution_overview_13\\"></a><strong>Solution overview</strong></h4>\\n<p>The solution combines PostgreSQL’s native range partitioning feature with pg_partman and Amazon RDS’s Amazon S3 export/import functions.</p>\n<p>PostgreSQL lets you divide a table into partitions based on key columns’ date/time ranges. It offers great performance and management benefits for archiving/purging historical data. Instead of bulk insert and delete, you simply copy the partition out for archive, then drop the partition when you no longer need it.</p>\n<p>pg_partman is a PostgreSQL extension that supports PostgreSQL’s native partitioning to create and manage time-based and serial-based partition sets. It automates the child partition creation and works with your retention policy to detach or drop the obsolete partitions for you.</p>\n<p>When a partition becomes obsolete, you call <strong>aws_s3.query_export_to_s3</strong> function to upload the data to a designated [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) folder. Once confirmed, the partition is dropped from its database. The archived data stays in the S3 bucket for its lifetime, going through different [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) storage tiers according to the Amazon <a href=\\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html\\" target=\\"_blank\\">S3 lifecycle management policy</a>.You can <a href=\\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference.html\\" target=\\"_blank\\">query</a> data in [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) directly, so restoring the archived data to the database becomes optional.</p>\\n<p>Figure 1 shows data movement between the database and the Amazon S3 archive:</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/b5c10f2abf3547abb143a8d673b6ae2c_image.png\\" alt=\\"image.png\\" /></p>\n<p>Figure 1 – Data movement between the database and Amazon S3</p>\n<p>Figure 2 shows the destination S3 bucket’s structure for a monthly partition table:</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/926087196e184c1ca6fe521598e6d213_image.png\\" alt=\\"image.png\\" /></p>\n<p>Figure 2 – Diagram representing the bucket’s folder organization</p>\n<p>The solution involves the following specific steps:</p>\n<ul>\\n<li>Set up S3 bucket policies</li>\n<li>Set up your database</li>\n<li>Partition the candidate table</li>\n<li>Upload the obsolete table partition to Amazon S3 and drop the archived partition from the database</li>\n<li>(Optional) Restore the archived data from Amazon S3 to the database</li>\n</ul>\\n<h4><a id=\\"Prerequisites_46\\"></a><strong>Prerequisites</strong></h4>\\n<p>To implement the solution, you must have the following components ready.</p>\n<ul>\\n<li>A data retention policy that defines what data should be archived, where, and for how long it should be kept.</li>\n<li>An <a href=\\"https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#kms_keys\\" target=\\"_blank\\">Amazon Web Services Managed Key Service (Amazon Web Services KMS)</a> customer key.</li>\\n<li>A private S3 bucket as the archive destination:\\n<ul>\\n<li>Encrypted with your Amazon Web Services KMS customer key</li>\n<li>With a <a href=\\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html\\" target=\\"_blank\\">lifecycle policy</a> configured</li>\\n</ul>\n</li>\\n<li>An RDS for PostgreSQL instance (version 12.5 and above) or Aurora PostgreSQL cluster (version 12.6 or above) with the pg_partman extension <a href=\\"https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_Partitions.html\\" target=\\"_blank\\">installed</a>.</li>\\n<li>A privileged IAM user to manage IAM policies, to run <a href=\\"https://aws.amazon.com/cli/\\" target=\\"_blank\\">Amazon Command Line Interface (Amazon Web Services CLI)</a> commands and to access the <a href=\\"https://aws.amazon.com/console/\\" target=\\"_blank\\">Amazon Web Services Management Console</a>.</li>\\n<li>A scheduler like pg_cron to run partition maintenance jobs.</li>\n<li>A database user who runs archive procedures. If you prefer, you may create a separate database user to perform archive operations on behalf of all the schema/table owners. In such case, you must grant the user select privilege on the target tables and their partitions:</li>\n</ul>\\n<pre><code class=\\"lang-\\">GRANT CONNECT ON DATABASE &lt;dbname&gt; TO &lt;archive_user&gt;;\\n</code></pre>\\n<pre><code class=\\"lang-\\">GRANT USAGE ON SCHEMA &lt;schemaname&gt; TO &lt;archive_user&gt;;\\n</code></pre>\\n<pre><code class=\\"lang-\\">GRANT SELECT ON &lt;tablename&gt; TO &lt;archive_user&gt;;\\n</code></pre>\\n<pre><code class=\\"lang-\\">GRANT SELECT ON &lt;tablepartitionname&gt; TO &lt;archive_user&gt;;\\n</code></pre>\\n<h4><a id=\\"Set_up_S3_bucket_Policies_83\\"></a><strong>Set up S3 bucket Policies</strong></h4>\\n<p>First, let’s set up the S3 bucket. Use the Amazon Web Services CLI to run the following commands:</p>\n<ol>\\n<li>Create a policy with the necessary privileges to access the S3 bucket and its Amazon Web Services KMS key. Take note of the policy ARN.</li>\n</ol>\\n<pre><code class=\\"lang-\\"> aws iam create-policy --policy-name pg-s3-policy --policy-document file://s3-exp-policy\\n</code></pre>\\n<p>In the preceding code, s3-exp-policy is a file in the current directory with the following content:</p>\n<pre><code class=\\"lang-\\">{\\n &quot;Version&quot;: &quot;2012-10-17&quot;,\\n &quot;Statement&quot;: [\\n {\\n &quot;Effect&quot;: &quot;Allow&quot;,\\n &quot;Action&quot;: [\\n &quot;s3:GetObject&quot;,\\n &quot;s3:AbortMultipartUpload&quot;,\\n &quot;s3:DeleteObject&quot;,\\n &quot;s3:ListMultipartUploadParts&quot;,\\n &quot;s3:PutObject&quot;,\\n &quot;s3:ListBucket&quot;\\n ],\\n &quot;Resource&quot;: [\\n &quot;arn:aws:s3:::&lt;bucketname&gt;/&lt;databasename&gt;*&quot;,\\n &quot;arn:aws:s3:::&lt;bucketname&gt;&quot;\\n ]\\n },\\n\\t\\t{\\n &quot;Effect&quot;: &quot;Allow&quot;,\\n &quot;Action&quot;: [\\n &quot;kms:Decrypt&quot;,\\n &quot;kms:DescribeKey&quot;,\\n\\t&quot;kms:GenerateDataKey&quot;,\\n\\t&quot;kms:Encrypt&quot;,\\n &quot;kms:ReEncrypt*&quot;\\n ],\\n &quot;Resource&quot;: &quot;&lt;yourKMSkeyarn&gt;&quot;\\n }\\t\\n ]\\n}\\n</code></pre>\\n<ol start=\\"2\\">\\n<li>Create two <a href=\\"https://aws.amazon.com/iam/\\" target=\\"_blank\\">Amazon Web Services Identity and Access Management (IAM)</a> roles: one for export and one for import.</li>\\n</ol>\n<pre><code class=\\"lang-\\">aws iam create-role --role-name pg-s3-export-role --assume-role-policy-document file://s3-exp-assumerole-policy\\n\\naws iam create-role --role-name pg-s3-import-role --assume-role-policy-document file://s3-exp-assumerole-policy\\n</code></pre>\\n<p>In the preceding code, s3-exp-assumerole-policy is saved in the current directory with the following contents:</p>\n<pre><code class=\\"lang-\\">{\\n&quot;Version&quot;: &quot;2012-10-17&quot;,&quot;Statement&quot;: [\\n {\\n &quot;Effect&quot;: &quot;Allow&quot;,&quot;Principal&quot;: {\\n &quot;Service&quot;: &quot;rds.amazonaws.com&quot;\\n },&quot;Action&quot;: &quot;sts:AssumeRole&quot;\\n }\\n ]\\n}\\n</code></pre>\\n<p>3.Attach the policy created in Step 1 to the roles created from Step 2.</p>\n<pre><code class=\\"lang-\\">aws iam attach-role-policy --policy-arn &lt;yourpolicyarn&gt; --role-name pg-s3-export-role\\naws iam attach-role-policy --policy-arn &lt;yourpolicyarn&gt; --role-name pg-s3-import-role\\n</code></pre>\\n<p>4.Add the roles to <a href=\\"https://aws.amazon.com/rds/aurora/\\" target=\\"_blank\\">your Amazon Aurora</a> cluster or RDS instance through the Amazon Web Services Management Console.</p>\\n<p>Select the Aurora cluster or RDS instance from the Connectivity &amp; Security tab, fill in the information, and select Add role. You enter the pg-s3-exprot-role and the pg-s3-import-role one by one as shown in Figure 3 and Figure 4.</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/65705d1664d84b84b2f83a867555adcb_image.png\\" alt=\\"image.png\\" /></p>\n<p>Figure 3 – Add S3 export role to Aurora cluster</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/0fb576892b2449bfacb078db460392e8_image.png\\" alt=\\"image.png\\" /></p>\n<p>Figure 4 – Add S3 import role to Aurora Cluster</p>\n<h4><a id=\\"Set_up_your_Database_170\\"></a><strong>Set up your Database</strong></h4>\\n<p>To set up your database, you <a href=\\"https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_Partitions.html\\" target=\\"_blank\\">install</a> the required PostgreSQL schema and extensions. From a PostgreSQL client like psql, complete the following steps as rds_superuser:</p>\\n<pre><code>1.Create partman schema and pg_partman extension:\\n</code></pre>\\n<pre><code class=\\"lang-\\">\\tCREATE SCHEMA partman;\\n\\n\\tCREATE EXTENSION pg_partman WITH SCHEMA partman;\\n</code></pre>\\n<pre><code class=\\"lang-\\">\\tpartman schema has the required tables and functions to help you configure and manage the table partitions.\\n</code></pre>\\n<p>2.Create aws_s3 extension. These functions enable data moving between the database and Amazon S3. Grant permissions to the database user whoever does the archive.</p>\n<pre><code class=\\"lang-\\">\\tCREATE EXTENSION aws_s3 CASCADE;\\n\\tGRANT USAGE ON SCHEMA aws_s3 TO &lt;archive_user&gt;;\\n\\tGRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA aws_s3 TO &lt;archive_user&gt;;\\n</code></pre>\\n<h4><a id=\\"Partition_the_candidate_table_190\\"></a><strong>Partition the candidate table</strong></h4>\\n<p>Examine the candidate table to determine the partition key based on your purge criteria. Learn how data is distributed to choose a proper partition interval and partition starting point.</p>\n<p>As an example, we created a PostgreSQL table called ticket_purchase_hist from aws-database-migration-samples provided through <a href=\\"https://github.com/aws-samples/aws-database-migration-samples/tree/master/PostgreSQL/sampledb/v1\\" target=\\"_blank\\">this link</a>.The README.md file has the details on how to create dms_sample schema and how to populate it with data. For performance reasons, we replaced the double precision data type with int.</p>\\n<p>The following is the DDL for ticket_purchase_hist:</p>\n<pre><code class=\\"lang-\\">CREATE TABLE dms_sample.ticket_purchase_hist (\\n\\tsporting_event_ticket_id int NOT NULL,\\n\\tpurchased_by_id int NOT NULL,\\n\\ttransaction_date_time timestamp(0) NOT NULL,\\n\\ttransferred_from_id int NULL,\\n\\tpurchase_price numeric(8, 2) NOT NULL,\\n\\tCONSTRAINT ticket_purchase_hist_pk PRIMARY KEY (sporting_event_ticket_id, purchased_by_id, transaction_date_time)\\n);\\n\\n</code></pre>\\n<p>By design, with each “sell” or “transfer” of a ticket, a record is inserted into this table with the current timestamp. There are a total of 5,742,278 records in the table. Figure 5 shows data distribution by month:</p>\n<p><img src=\\"https://dev-media.amazoncloud.cn/969073c2e250420d8ddd28d8c2032cd1_image.png\\" alt=\\"image.png\\" /></p>\n<p>Figure 5 – Table data distribution by month</p>\n<p>Assume the retention policy is 12 months. The optimal partition strategy is to partition this table on the transaction_date_time column with a monthly interval.</p>\n<p><strong>Note</strong>: The procedures in this section require application downtime.</p>\\n<p>Using the psql tool, follow these steps to partition the table:</p>\n<pre><code>1.Create a new monthly range partition table from the original table. Choose transaction_date_time as the partition key.\\n</code></pre>\\n<pre><code class=\\"lang-\\">CREATE TABLE dms_sample.ticket_purchase_hist_temp (\\nsporting_event_ticket_id int NOT NULL,\\npurchased_by_id int NOT NULL,\\ntransaction_date_time timestamp(0) NOT NULL,\\ntransferred_from_id int NULL,\\npurchase_price numeric(8, 2) NOT NULL)\\nPARTITION BY RANGE(transaction_date_time);\\n</code></pre>\\n<pre><code>2.Rename the tables so that the new table becomes the working table:\\n</code></pre>\\n<pre><code class=\\"lang-\\">ALTER TABLE dms_sample.ticket_purchase_hist RENAME TO ticket_purchase_hist_old;\\nALTER TABLE dms_sample.ticket_purchase_hist_temp RENAME TO ticket_purchase_hist;\\n\\n</code></pre>\\n<pre><code>3.As rds_superuser, register the parent table with pg_partman and create the partitions. Since the earliest transaction date in this table is January 10, 2021, we set p_start_partition=&gt;’2021-01-01’. Your transaction date may be different, so make sure that you query your table and set p_start_partition accordingly.\\n</code></pre>\\n<pre><code class=\\"lang-\\">SELECT partman.create_parent( p_parent_table =&gt; 'dms_sample.ticket_purchase_hist', p_control =&gt; 'transaction_date_time', p_type =&gt; 'native', p_interval=&gt; 'monthly', p_premake =&gt; 7,p_start_partition =&gt; '2021-01-01');\\n</code></pre>\\n<pre><code>4.Drop the default partition to avoid possible future partition creation issues. The default partition is to store records that can’t be mapped to any other partitions. In case a record is inserted into the default partition, creating a partition using that record value range will fail.\\n</code></pre>\\n<pre><code class=\\"lang-\\">DROP TABLE dms_sample.ticket_purchase_hist_default;\\n</code></pre>\\n<pre><code>5.Migrate data into the partitioned table.\\n</code></pre>\\n<pre><code class=\\"lang-\\">INSERT INTO dms_sample.ticket_purchase_hist SELECT * FROM dms_sample.ticket_purchase_hist_old;\\n\\n</code></pre>\\n<pre><code>6.Verify that the data was successfully inserted into each partition. For example:\\n</code></pre>\\n<pre><code class=\\"lang-\\">SELECT count(*) FROM dms_sample. ticket_purchase_hist_p2021_01;\\nOutput: 64946\\n\\n</code></pre>\\n<pre><code>7.Add keys and indexes to the partitioned table if they are part of the original table.\\n</code></pre>\\n<pre><code class=\\"lang-\\">ALTER TABLE dms_sample.ticket_purchase_hist ADD CONSTRAINT ticket_purchase_hist_pk1 PRIMARY KEY (sporting_event_ticket_id, purchased_by_id, transaction_date_time);\\n\\n</code></pre>\\n<pre><code>8.As rds_superuser, set retention policy with pg_partman.\\n</code></pre>\\n<p>This example uses 12 months. Your environment may have different requirements.</p>\n<pre><code class=\\"lang-\\">UPDATE partman.part_config SET infinite_time_partitions = true, retention = '12 months', retention_keep_table=true WHERE parent_table = 'dms_sample.ticket_purchase_hist';\\n\\n</code></pre>\\n<pre><code>9.As rds_superuser, from the psql command line, run the following query periodically (monthly in this example).\\n</code></pre>\\n<pre><code class=\\"lang-\\">CALL partman.run_maintenance_proc();\\n</code></pre>\\n<p>The procedure creates the future partitions ahead of time and detach the obsolete partitions according to the retention that we set at Step 8. We recommend that you run this regularly from a job scheduler. Refer to the RDS documentation to learn more.</p>\n<h4><a id=\\"Upload_obsolete_table_partition_to_Amazon_S3_and_drop_the_archived_partition_from_the_database_294\\"></a><strong>Upload obsolete table partition to Amazon S3 and drop the archived partition from the database</strong></h4>\\n<p>A partition is maintained as a child table to its parent by pg_partman. When time goes by, a partition will become obsolete as evaluated against the retention policy. Partman.run_maintenance_proc() detects and detaches these partitions from its parent table automatically. Your task is to identify the detached partitions and ship them to the designated S3 bucket for archiving.</p>\n<p>1.From the psql command line, run the following query to identify the partitions that are detached from the parent:</p>\n<pre><code class=\\"lang-\\">select relname, n.nspname\\nfrom\\npg_class \\njoin pg_namespace n on n.oid = relnamespace\\nwhere relkind = 'r' and relispartition ='f'\\nand relname like ‘ticket_purchase_hist_p%' and n.nspname = 'dms_sample';\\n\\noutput:\\nrelname \\t|nspname |\\nticket_purchase_hist_p2021_01|dms_sample|\\n</code></pre>\\n<p>2.Export the detached table data to the designated folder in the S3 bucket.<br />\\nIn this step, you run the psql command to archive the data to the S3 bucket using two PostgreSQL functions:</p>\n<ul>\\n<li>create_s3_uri defines the target S3 bucket information. In the example, you enter the bucket name as ‘dbarchive-test‘, path as ‘testdb/dms_sample/ticket_purchase_hist/202101/data’, and Amazon Web Services regions as ‘us-east-1‘.</li>\n<li>query_export_to_s3 function extracts and exports data based on a query.<br />\\nIn the following example, the query exports the data for the entire partition:</li>\n</ul>\\n<pre><code class=\\"lang-\\">SELECT *\\nFROM aws_s3.query_export_to_s3(\\n'SELECT * FROM dms_sample.ticket_purchase_hist_p2021_01',\\naws_commons.create_s3_uri(\\n'dbarchive-test',\\n'testdb/dms_sample/ticket_purchase_hist/202101/data','us-east-1'));\\n</code></pre>\\n<p>3.Validate by comparing the total records in Amazon S3 and records in the database table.<br />\\nFrom Amazon Web Services CLI, run the following Amazon S3 Select query to count records for the archived table in the S3 bucket. In the example, the number of records is written to an output file called “recordcount.csv”.</p>\n<pre><code class=\\"lang-\\">aws s3api select-object-content \\\\\\n --bucket dbarchive-test \\\\\\n --key testdb/dms_sample/ticket_purchase_hist/202101/data \\\\\\n --expression &quot;select count(*) from s3object&quot; \\\\\\n --expression-type 'SQL' \\\\\\n --input-serialization '{&quot;CSV&quot;: {}, &quot;CompressionType&quot;: &quot;NONE&quot;}' \\\\\\n--output-serialization '{&quot;CSV&quot;: {}}' &quot;recordcount.csv&quot;\\nYou should see 64946 in recordcount.csv file, which matches the table record count.\\n</code></pre>\\n<p>Upload the archived table DDL to Amazon S3.<br />\\nBecause the table structure may change over time, it’s a good idea to add the archived table DDL to the Amazon S3 folder as well. In the following example, with the PostgreSQL <strong>pg_dump</strong> command, you first save the create table statement as a file named “ticket_purchase_hist_p2021_01.sql”. Then, you upload it to the S3 bucket.From the machine with pg_dump installed, run the command to extract and save the create table DDL.</p>\\n<pre><code class=\\"lang-\\">pg_dump -h &lt;dbhost&gt; -d &lt;dbname&gt; -U &lt;dbuser&gt; -s -t dms_sample.ticket_purchase_hist_p2021_01 &gt; ticket_purchase_hist_p2021_01.sql\\n\\n</code></pre>\\n<p>With Amazon Web Services CLI, upload the ticket_purchase_hist_p2021_01.sql to the DDL folder of the S3 bucket.</p>\n<pre><code class=\\"lang-\\">aws s3 cp ticket_purchase_hist_p2021_01.sql s3://dbarchive-test/testdb/dms_sample/ticket_purchase_hist/202101/ddl/ticket_purchase_hist_p2021_01.sql\\n</code></pre>\\n<p>5.Drop the archived table from the database using psql.</p>\n<pre><code class=\\"lang-\\">Drop table dms_sample.ticket_purchase_hist_p2021_01 cascade;\\n\\nNote: You need to be the table owner to drop the table.\\n\\n</code></pre>\\n<h4><a id=\\"Optional_Restore_archived_data_from_Amazon_S3_to_the_database_366\\"></a><strong>(Optional) Restore archived data from Amazon S3 to the database</strong></h4>\\n<p>You can access the data in Amazon S3 using tools like <a href=\\"https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-select.html\\" target=\\"_blank\\">Amazon S3 Select</a> ([Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) <a href=\\"https://docs.aws.amazon.com/amazonglacier/latest/dev/glacier-select.html\\" target=\\"_blank\\">Glacier Select</a>) or <a href=\\"https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&amp;whats-new-cards.sort-order=desc\\" target=\\"_blank\\">Amazon Athena</a>.<br />\\nOnly by special requirement, you need to restore the archived data to a database. You can achieve this in two steps:</p>\n<p>1.Restore the table ddl from Amazon S3.<br />\\nFrom Amazon Web Services CLI, download the ddl from the S3 bucket.<br />\\nFor example:</p>\n<pre><code class=\\"lang-\\">aws s3 cp s3://dbarchive-test/testdb/dms_sample/ticket_purchase_hist/202101/ddl/ticket_purchase_hist_p2021_01.sql ticket_purchase_hist_p2021_01.sql\\n</code></pre>\\n<p>From psql, create the table using the above file.<br />\\nFor example:</p>\n<pre><code class=\\"lang-\\">psql &quot;dbname=&lt;dbname&gt; host=&lt;dbhost&gt; user=&lt;dbuser&gt; port=5432&quot; -f ticket_purchase_hist_p2021_01.sql\\n</code></pre>\\n<p>2.Restore the archived data by calling aws_s3.table_import_from_s3 and aws_commons.create_s3_uri functions.</p>\n<ul>\\n<li>aws_commons.create_s3_uri defines the source S3 bucket information. In the example, you enter the bucket name as ‘dbarchive-test‘, path as ‘testdb/dms_sample/ticket_purchase_hist/202101/data’, and Amazon Web Services regions as ‘us-east-1‘.</li>\n<li>aws_s3.table_import_from_s3 imports data from the S3 bucket to the target table. In the example, you enter ‘dms_sample.ticket_purchase_hist_p2021_01’ as target table name, and ‘text’ for format.</li>\n</ul>\\n<p>Example code for psql:</p>\n<pre><code class=\\"lang-\\">SELECT *\\nFROM aws_s3.table_import_from_s3(\\n'dms_sample.ticket_purchase_hist_p2021_01',\\n'', \\n'(format text)',\\naws_commons.create_s3_uri(\\n'dbarchive-test',\\n'testdb/dms_sample/ticket_purchase_hist/202101/data','us-east-1'))\\n</code></pre>\\n<p>Output:</p>\n<pre><code class=\\"lang-\\">64946 rows imported into relation &quot;dms_sample.ticket_purchase_hist_p2021_01 &quot; from file testdb/dms_sample/ticket_purchase_hist/202101/data of 3003108 bytes\\n</code></pre>\\n<h4><a id=\\"Summary_410\\"></a><strong>Summary</strong></h4>\\n<p>In this post, we’ve showed you an efficient PostgreSQL database archive solution with table partitioning using pg_partman and Amazon S3 functions. You can adapt the same solution to multiple tables in your own schema. Although we run the commands manually, you can automate the process by scripting them together and using <a href=\\"https://aws.amazon.com/lambda/\\" target=\\"_blank\\">Amazon Web Services Lambda</a> to schedule it to run.</p>\\n<p>Share your experience in the Comments section. We highly value your feedback!</p>\n<h4><a id=\\"About_the_Author_417\\"></a><strong>About the Author</strong></h4>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/26f54246846842dca3ff4e4b9e10a17c_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Li Liu</strong> is a Senior Database Specialty Architect with the Professional Services team at Amazon Web Services. She helps customers to migrate traditional on-premise databases to Amazon Web Services cloud. She specializes in database design, architecture and performance tuning.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/a59dbb1b4ad34680b9767177c79a06b1_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Shunan Xiang</strong> is a Senior Database Consultant with the Professional Services team at Amazon Web Services. He works as a database migration specialist to help Amazon customers in design and implementing scalable, secure, performant, and robust database solutions on the cloud.</p>\\n<p><img src=\\"https://dev-media.amazoncloud.cn/865011b7ed7b4180998e7218f2342281_image.png\\" alt=\\"image.png\\" /></p>\n<p><strong>Samujjwal Roy</strong> is a Principal DB Consultant with the Professional Services team at Amazon Web Services. He has been with Amazon for 17+ years and has led migration projects for internal and external Amazon customers to move their on-premises database environment to Amazon Web Services Cloud database solutions.</p>\n"}
目录
亚马逊云科技解决方案 基于行业客户应用场景及技术领域的解决方案
联系亚马逊云科技专家
亚马逊云科技解决方案
基于行业客户应用场景及技术领域的解决方案
联系专家
0
目录
关闭