NEW: Script to report s3 deep archive restore status

This commit is contained in:
xpk
2025-11-11 19:09:22 +08:00
parent 7020d5dc38
commit e474e375a9
+136
View File
@@ -0,0 +1,136 @@
#!/usr/bin/env python3
"""
S3 Restore Status Checker
This script checks the restore status of S3 objects that have been archived to
Glacier or Deep Archive storage classes. It reads a CSV file containing bucket
and object key pairs, then queries each object's restore status using the
S3 head_object API.
The script is useful for monitoring the progress of S3 batch restore operations
or checking the status of individual object restorations from Glacier storage.
Requirements:
- boto3: AWS SDK for Python
- AWS credentials configured (via ~/.aws/credentials, environment variables, or IAM role)
- Proper IAM permissions to read object metadata (s3:GetObject, s3:ListBucket)
Usage:
# Normal mode (debug output with filenames):
python3 s3-restore-status.py
# Optimized mode (status only, no filenames):
python3 -O s3-restore-status.py
Input File Format:
The script reads /tmp/objectlist.csv by default. Each line should contain:
<bucket>,<object_key>
Example:
my-bucket,path/to/file1.log
my-bucket,path/to/file2.log
another-bucket,archive/data.json
Output:
The script prints the restore status for each object:
- "Not restoring" if the object has no active restore operation
- Restore status string (e.g., 'ongoing-request="true"') if restoration is in progress
- Restore status with expiry date (e.g., 'ongoing-request="false", expiry-date="..."')
if restoration is complete
In normal mode (when Python is run without -O optimization flag), output includes
the filename: <filename>: <restore_status>
In optimized mode (python3 -O), only the restore status is printed.
Example Output:
Not-being-restored
ongoing-request="true"
ongoing-request="false", expiry-date="Mon, 25 Nov 2025 00:00:00 GMT"
"""
import boto3
import sys
def read_objectlist(path: str = "/tmp/objectlist.csv"):
"""Read object list CSV file and extract bucket and object keys.
Parses a CSV file containing bucket and object key pairs. Each line should
have the format: "<bucket>,<object_key>". The function collects all object
keys and returns the last bucket name encountered (assuming all objects are
in the same bucket).
Args:
path (str): Path to the CSV file containing bucket and object key pairs.
Defaults to "/tmp/objectlist.csv".
Returns:
tuple: A tuple containing:
- list: List of object keys (strings)
- str: The bucket name (last bucket encountered in the file)
Raises:
SystemExit: If the file is not found, the script exits with status code 1.
Note:
- Lines starting with '#' are treated as comments and ignored
- Empty lines are ignored
- Malformed lines (not containing exactly one comma) are silently skipped
- The function assumes all objects are in the same bucket (returns last bucket)
"""
bucket_to_keys = []
try:
with open(path, "r", encoding="utf-8") as f:
for raw_line in f:
line = raw_line.strip()
# Split only on the first comma to allow commas in keys if ever present
parts = line.split(",", 1)
if len(parts) != 2:
continue
bucket = parts[0].strip()
key = parts[1].strip()
bucket_to_keys.append(key)
except FileNotFoundError:
print(f"Error: object list file not found at {path}", file=sys.stderr)
sys.exit(1)
return bucket_to_keys, bucket
def main():
"""Main function to check restore status of S3 objects.
Reads the object list from CSV file, connects to AWS S3 in ap-east-1 region,
and queries the restore status for each object. Prints the restore status
for each object to stdout.
The restore status indicates:
- "Not restoring": Object is not currently being restored (or restore completed and expired)
- Restore status string: Contains restore progress and expiry information
In debug mode (Python run without -O optimization flag), output includes
the filename along with the status for easier identification.
Region: ap-east-1 (Asia Pacific - Hong Kong)
"""
session = boto3.Session(region_name="ap-east-1")
s3_client = session.client('s3')
keys, bucket = read_objectlist(path="/tmp/objectlist.csv")
for object in keys:
response = s3_client.head_object(
Bucket=bucket,
Key=object
)
# Split object path by / and get only the last element (filename)
restore_status = response.get('Restore') or "Not-being-restored"
if __debug__:
filename = object.split('/')[-1]
print(f"{filename}: {restore_status}")
print(restore_status)
if __name__ == "__main__":
main()