下载一个 gzip 文件,对其进行 md5 校验和,如果匹配则保存提取的数据
Download a gzipped file, md5 checksum it, and then save extracted data if matches
我目前正在尝试使用 Python 下载两个文件,一个是 gzip 压缩文件,另一个是其校验和。
我想验证 gzip 文件的内容是否与 md5 校验和匹配,然后我想将内容保存到目标目录。
我找到了如何下载文件 here, and I learned how to calculate the checksum here. I load the URLs from a JSON config file, and I learned how to parse JSON file values here。
我将它们全部放在了以下脚本中,但我无法尝试存储 gzip 文件的已验证内容。
import json
import gzip
import urllib
import hashlib
# Function for creating an md5 checksum of a file
def md5Gzip(fname):
hash_md5 = hashlib.md5()
with gzip.open(fname, 'rb') as f:
# Make an iterable of the file and divide into 4096 byte chunks
# The iteration ends when we hit an empty byte string (b"")
for chunk in iter(lambda: f.read(4096), b""):
# Update the MD5 hash with the chunk
hash_md5.update(chunk)
return hash_md5.hexdigest()
# Open the configuration file in the current directory
with open('./config.json') as configFile:
data = json.load(configFile)
# Open the downloaded checksum file
with open(urllib.urlretrieve(data['checksumUrl'])[0]) as checksumFile:
md5Checksum = checksumFile.read()
# Open the downloaded db file and get it's md5 checksum via gzip.open
fileMd5 = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])
if (fileMd5 == md5Checksum):
print 'Downloaded Correct File'
# save correct file
else:
print 'Downloaded Incorrect File'
# do some error handling
在您的 md5Gzip
中,return 一个 tuple
而不仅仅是散列。
def md5Gzip(fname):
hash_md5 = hashlib.md5()
file_content = None
with gzip.open(fname, 'rb') as f:
# Make an iterable of the file and divide into 4096 byte chunks
# The iteration ends when we hit an empty byte string (b"")
for chunk in iter(lambda: f.read(4096), b""):
# Update the MD5 hash with the chunk
hash_md5.update(chunk)
# get file content
f.seek(0)
file_content = f.read()
return hash_md5.hexdigest(), file_content
然后,在您的代码中:
fileMd5, file_content = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])
我目前正在尝试使用 Python 下载两个文件,一个是 gzip 压缩文件,另一个是其校验和。
我想验证 gzip 文件的内容是否与 md5 校验和匹配,然后我想将内容保存到目标目录。
我找到了如何下载文件 here, and I learned how to calculate the checksum here. I load the URLs from a JSON config file, and I learned how to parse JSON file values here。
我将它们全部放在了以下脚本中,但我无法尝试存储 gzip 文件的已验证内容。
import json
import gzip
import urllib
import hashlib
# Function for creating an md5 checksum of a file
def md5Gzip(fname):
hash_md5 = hashlib.md5()
with gzip.open(fname, 'rb') as f:
# Make an iterable of the file and divide into 4096 byte chunks
# The iteration ends when we hit an empty byte string (b"")
for chunk in iter(lambda: f.read(4096), b""):
# Update the MD5 hash with the chunk
hash_md5.update(chunk)
return hash_md5.hexdigest()
# Open the configuration file in the current directory
with open('./config.json') as configFile:
data = json.load(configFile)
# Open the downloaded checksum file
with open(urllib.urlretrieve(data['checksumUrl'])[0]) as checksumFile:
md5Checksum = checksumFile.read()
# Open the downloaded db file and get it's md5 checksum via gzip.open
fileMd5 = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])
if (fileMd5 == md5Checksum):
print 'Downloaded Correct File'
# save correct file
else:
print 'Downloaded Incorrect File'
# do some error handling
在您的 md5Gzip
中,return 一个 tuple
而不仅仅是散列。
def md5Gzip(fname):
hash_md5 = hashlib.md5()
file_content = None
with gzip.open(fname, 'rb') as f:
# Make an iterable of the file and divide into 4096 byte chunks
# The iteration ends when we hit an empty byte string (b"")
for chunk in iter(lambda: f.read(4096), b""):
# Update the MD5 hash with the chunk
hash_md5.update(chunk)
# get file content
f.seek(0)
file_content = f.read()
return hash_md5.hexdigest(), file_content
然后,在您的代码中:
fileMd5, file_content = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])