将 URL 中的大文件直接流式传输到 gzip 文件中
Stream a large file from URL straight into a gzip file
我想直接将大文件流式传输到 gzip 文件中,而不是将其全部下载到内存中然后进行压缩。这是我取得的进展(不起作用)。我知道如何在 python 中下载一个文件并保存,我知道如何压缩一个文件,但流媒体部分不起作用。
注意:这个链接的 csv 并不大,它只是一个例子 url。
import requests
import zlib
url = f"http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"
with requests.get(url, stream=True) as r:
compressor = zlib.compressobj()
with open(save_file_path, 'wb') as f:
f.write(compressor.compress(r.raw))
好的,我明白了:
with requests.get(url, stream=True, verify=False) as r:
if save_file_path.endswith('gz'):
compressor = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
with open(save_file_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024*1024):
f.write(compressor.compress(chunk))
f.write(compressor.flush())
else:
with open(save_file_path, 'wb') as f:
shutil.copyfileobj(r.raw, f)
我想直接将大文件流式传输到 gzip 文件中,而不是将其全部下载到内存中然后进行压缩。这是我取得的进展(不起作用)。我知道如何在 python 中下载一个文件并保存,我知道如何压缩一个文件,但流媒体部分不起作用。
注意:这个链接的 csv 并不大,它只是一个例子 url。
import requests
import zlib
url = f"http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"
with requests.get(url, stream=True) as r:
compressor = zlib.compressobj()
with open(save_file_path, 'wb') as f:
f.write(compressor.compress(r.raw))
好的,我明白了:
with requests.get(url, stream=True, verify=False) as r:
if save_file_path.endswith('gz'):
compressor = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
with open(save_file_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024*1024):
f.write(compressor.compress(chunk))
f.write(compressor.flush())
else:
with open(save_file_path, 'wb') as f:
shutil.copyfileobj(r.raw, f)