<!--StartFragment-->
Are you stuck with S3 files that were accidentally moved to the Glacier storage tier? Do you need an efficient yet simple way to restore them? If yes, then this guide is for you.
[Amazon S3](https://aws.amazon.com/cn/s3/?trk=cndc-detail) is a scalable object storage service while Amazon Glacier is a secure, durable, and low-cost storage service for data archiving and long-term backup. Sometimes, files may end up in Glacier storage either intentionally, for cost-saving purposes, or unintentionally due to errors.
Regardless, the restoration process can be daunting. This guide will help you simplify this process with Python and AWS’s boto3 library.
Below is the Python script that will get the job done:
```
import multiprocessing as mp
import os
import sys
import shutil
import traceback
import zipfile
import requests
import json
import socket
import boto3
import threadpool
import time
from datetime import timedelta, datetime
from pymongo import MongoClient
from loguru import logger
from botocore.exceptions import ClientError
import re
ACK = "Your Access Key"
ACS = "Your Secret Key"
bucket_name = "Your Bucket Name"
s3_remote_dir = "Path to Your S3 Directory"
s3 = boto3.client('s3', aws_access_key_id=ACK, aws_secret_access_key=ACS)
# local path
local_save_path = './temp/' # temporarily download files to local path
def _get_all_s3_objects(**base_kwargs):
"""
Get all objects under s3_remote_dir
"""
try:
continuation_token = None
while True:
list_kwargs = dict(MaxKeys=1000, **base_kwargs)
if continuation_token:
list_kwargs['ContinuationToken'] = continuation_token
response = s3.list_objects_v2(**list_kwargs)
yield from response.get('Contents', [])
if not response.get('IsTruncated'): # At the end of the list?
break
continuation_token = response.get('NextContinuationToken')
except:
# send_dingtalk_message(traceback.format_exc())
logger.error(traceback.format_exc())
def head_object(bucket_name, object_name):
s3 = boto3.client('s3')
response = None
try:
response = s3.head_object(Bucket=bucket_name, Key=object_name)
except ClientError as e:
logger.error(e)
logger.error(
f"NoSuchBucket, NoSuchKey, or InvalidObjectState error == the object's, storage class was not GLACIER. {bucket_name} {object_name} ")
return None
return response
# resore objects from glacier tier
def restore_object(bucket_name, object_name, days, retrieval_type='Expedited'):
request = {'Days': days,
'GlacierJobParameters': {'Tier': retrieval_type}}
# s3 = boto3.client('s3')
try:
s3.restore_object(Bucket=bucket_name, Key=object_name, RestoreRequest=request)
except ClientError as e:
logger.error(e)
logger.error(
f"NoSuchBucket, NoSuchKey, or InvalidObjectState error == the object's, storage class was not GLACIER. {bucket_name} {object_name} ")
return False
return True
key_content_list = []
total_content_size = 0
while True:
doing_count, done_count, need_count = 0, 0, 0
s3_objects = _get_all_s3_objects(Bucket=bucket_name, Prefix=s3_remote_dir)
for obj in s3_objects:
key = obj.get('Key', None)
file_name = key.split("/")[-1]
# In hive, files under glacier tier are then moved to sub-directories called 'HIVE_UNION_SUBDIR_*'
# We need to move it to the right place
to_key = re.sub("\\/HIVE_UNION_SUBDIR_[\\d]+\\/", "/", key)
print(key, to_key, file_name)
success = head_object(bucket_name, key)
need_count += 1
if success:
if success.get('Restore'):
print('Restore {}'.format(success['Restore']))
index = success['Restore'].find('ongoing-request=\\"false\\"')
if -1 == index:
print(f"{need_count} Under recovering...{key}")
doing_count += 1
else:
print(success['Restore'][success['Restore'].find('expiry-date='):])
print(f"{need_count} Recover succeeded...{key}")
find_spec_path = file_name.find('HIVE_UNION_SUBDIR')
if -1 != find_spec_path:
print("no need to download...{key}")
else:
s3.download_file(bucket_name, key, local_save_path + file_name)
s3.upload_file(local_save_path + file_name, bucket_name, to_key)
done_count += 1
else:
print(f'{need_count} neet to recovery... {key}')
restore_object(bucket_name, key, 10)
print(doing_count, done_count, need_count)
if done_count == need_count:
break
time.sleep(15)
```
# Understanding the Code
The solution uses the `boto3` library to interact with the AWS S3 and Glacier services.
The `_get_all_s3_objects` function is created to list all the objects in the specified S3 bucket and directory.
The `head_object` function retrieves metadata from an object without returning the object itself. This is useful for checking if a file exists and its status.
The `restore_object` function is the heart of the script. It initiates a job to restore a file from Glacier to S3. The `Days` parameter specifies the lifetime of the temporary copy of the object in the S3 bucket, while `GlacierJobParameters` sets the speed (tier) of the restoration process.
The script then loops through the objects in the specified S3 location. For each object, it checks if a restore operation is already in progress or completed using the `head_object` function. If a restore operation is in progress, it leaves it alone. If the restore operation is complete, it re-uploads the file to the desired location. If the file hasn't been restored at all, it starts a restore operation.
The process continues until all files have been restored.
# In Conclusion
The script provides a way to automate the process of restoring files from the Glacier storage tier back to S3. It can be a lifesaver if you’ve got hundreds or even thousands of files to restore.
Remember to replace the placeholders in the script with your actual AWS credentials, bucket names, and file paths. Happy restoring!
<!--EndFragment-->