Recovering Accidentally Transitioned S3 Files from the Glacier Tier

海外精选

Amazon S3 Glacier

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

Are you stuck with S3 files that were accidentally moved to the Glacier storage tier? Do you need an efficient yet simple way to restore them? If yes, then this guide is for you. [Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) is a scalable object storage service while Amazon Glacier is a secure, durable, and low-cost storage service for data archiving and long-term backup. Sometimes, files may end up in Glacier storage either intentionally, for cost-saving purposes, or unintentionally due to errors. Regardless, the restoration process can be daunting. This guide will help you simplify this process with Python and AWS’s boto3 library. Below is the Python script that will get the job done: ``` import multiprocessing as mp import os import sys import shutil import traceback import zipfile import requests import json import socket import boto3 import threadpool import time from datetime import timedelta, datetime from pymongo import MongoClient from loguru import logger from botocore.exceptions import ClientError import re ACK = "Your Access Key" ACS = "Your Secret Key" bucket_name = "Your Bucket Name" s3_remote_dir = "Path to Your S3 Directory" s3 = boto3.client('s3', aws_access_key_id=ACK, aws_secret_access_key=ACS) # local path local_save_path = './temp/' # temporarily download files to local path def _get_all_s3_objects(**base_kwargs): """ Get all objects under s3_remote_dir """ try: continuation_token = None while True: list_kwargs = dict(MaxKeys=1000, **base_kwargs) if continuation_token: list_kwargs['ContinuationToken'] = continuation_token response = s3.list_objects_v2(**list_kwargs) yield from response.get('Contents', []) if not response.get('IsTruncated'): # At the end of the list? break continuation_token = response.get('NextContinuationToken') except: # send_dingtalk_message(traceback.format_exc()) logger.error(traceback.format_exc()) def head_object(bucket_name, object_name): s3 = boto3.client('s3') response = None try: response = s3.head_object(Bucket=bucket_name, Key=object_name) except ClientError as e: logger.error(e) logger.error( f"NoSuchBucket, NoSuchKey, or InvalidObjectState error == the object's, storage class was not GLACIER. {bucket_name} {object_name} ") return None return response # resore objects from glacier tier def restore_object(bucket_name, object_name, days, retrieval_type='Expedited'): request = {'Days': days, 'GlacierJobParameters': {'Tier': retrieval_type}} # s3 = boto3.client('s3') try: s3.restore_object(Bucket=bucket_name, Key=object_name, RestoreRequest=request) except ClientError as e: logger.error(e) logger.error( f"NoSuchBucket, NoSuchKey, or InvalidObjectState error == the object's, storage class was not GLACIER. {bucket_name} {object_name} ") return False return True key_content_list = [] total_content_size = 0 while True: doing_count, done_count, need_count = 0, 0, 0 s3_objects = _get_all_s3_objects(Bucket=bucket_name, Prefix=s3_remote_dir) for obj in s3_objects: key = obj.get('Key', None) file_name = key.split("/")[-1] # In hive, files under glacier tier are then moved to sub-directories called 'HIVE_UNION_SUBDIR_*' # We need to move it to the right place to_key = re.sub("\\/HIVE_UNION_SUBDIR_[\\d]+\\/", "/", key) print(key, to_key, file_name) success = head_object(bucket_name, key) need_count += 1 if success: if success.get('Restore'): print('Restore {}'.format(success['Restore'])) index = success['Restore'].find('ongoing-request=\\"false\\"') if -1 == index: print(f"{need_count} Under recovering...{key}") doing_count += 1 else: print(success['Restore'][success['Restore'].find('expiry-date='):]) print(f"{need_count} Recover succeeded...{key}") find_spec_path = file_name.find('HIVE_UNION_SUBDIR') if -1 != find_spec_path: print("no need to download...{key}") else: s3.download_file(bucket_name, key, local_save_path + file_name) s3.upload_file(local_save_path + file_name, bucket_name, to_key) done_count += 1 else: print(f'{need_count} neet to recovery... {key}') restore_object(bucket_name, key, 10) print(doing_count, done_count, need_count) if done_count == need_count: break time.sleep(15) ``` # Understanding the Code The solution uses the `boto3` library to interact with the AWS S3 and Glacier services. The `_get_all_s3_objects` function is created to list all the objects in the specified S3 bucket and directory. The `head_object` function retrieves metadata from an object without returning the object itself. This is useful for checking if a file exists and its status. The `restore_object` function is the heart of the script. It initiates a job to restore a file from Glacier to S3. The `Days` parameter specifies the lifetime of the temporary copy of the object in the S3 bucket, while `GlacierJobParameters` sets the speed (tier) of the restoration process. The script then loops through the objects in the specified S3 location. For each object, it checks if a restore operation is already in progress or completed using the `head_object` function. If a restore operation is in progress, it leaves it alone. If the restore operation is complete, it re-uploads the file to the desired location. If the file hasn't been restored at all, it starts a restore operation. The process continues until all files have been restored. # In Conclusion The script provides a way to automate the process of restoring files from the Glacier storage tier back to S3. It can be a lifesaver if you’ve got hundreds or even thousands of files to restore. Remember to replace the placeholders in the script with your actual AWS credentials, bucket names, and file paths. Happy restoring!