将 JSON 缓冲区转换为 .gz

Question

我有以下代码将 pandas 数据帧保存到内存中的 json，然后将其加载到 AWS S3：

json_buffer = StringIO()
df.to_json(json_buffer, orient='records', date_format='iso')
json_file_name = file_to_load.split(".")[0] + ".json"
s3_conn.put_object(Body=json_buffer.getvalue(), Bucket=s3_bucket, Key=f"{target_path}{json_file_name}")

我想做的是在将 json 文件放入 S3 存储桶之前将其存档为 gzip (.gz) 格式。您对如何实现这一点有什么想法吗？

谢谢！

Answer 1

您可以通过 gzip 使用以下内容：

with gzip.open(json_file_name, 'wt', encoding='UTF-8') as zipfile:
    json.dump(data, zipfile)

这将以 gzip 格式保存 json 文件。所以调用是在传递给 S3 buckt 之前。

Answer 2

虽然上面的答案很简单，但是如果你在 linux kernel 上使用 jupyter notebook 那么你可以简单地做

with open("temp.json", 'w') as f:
            f.write(json.dumps(json_data))

! gzip temp.json

upload("bocket_name", "name_of_the_json_file.json.gz", "temp.json.gz")
# def upload(bucket_name, item_name, file_path):

! rm temp.json.gz

将 JSON 缓冲区转换为 .gz

Convert a JSON buffer into .gz

python

json

gzip

amazon-s3

pandas