'utf-8' 编解码器无法解码字节 0x80
'utf-8' codec can't decode byte 0x80
我正在尝试下载 BVLC 训练的模型,但遇到此错误
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte
我认为是因为下面的函数(complete code)
# Closure-d function for checking SHA1.
def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
with open(filename, 'r') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
知道如何解决这个问题吗?
您没有指定以二进制模式打开文件,因此 f.read()
试图将文件作为 UTF-8 编码的文本文件读取,这似乎不起作用。但是由于我们采用的是 bytes 的散列,而不是 strings 的散列,因此编码是什么,甚至文件是否为文本都无关紧要完全:打开它,然后以二进制文件的形式读取它。
>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte
但是
>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325
您正在打开一个非 UTF-8 编码的文件,而您系统的默认编码设置为 UTF-8。
由于您计算的是 SHA1 哈希,因此您应该将数据读取为 binary。 hashlib
函数要求您以字节形式传递:
with open(filename, 'rb') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
注意在文件模式中添加b
mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r'
which means open for reading in text mode. [...] In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False)
is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)
和来自 hashlib
module documentation:
You can now feed this object with bytes-like objects (normally bytes) using the update() method.
由于文档和 src 代码中没有任何提示,我不知道为什么,但是使用 b char(我猜是二进制)完全有效(tf-version:1.1.0):
image_data = tf.gfile.FastGFile(filename, 'rb').read()
我正在尝试下载 BVLC 训练的模型,但遇到此错误
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 110: invalid start byte
我认为是因为下面的函数(complete code)
# Closure-d function for checking SHA1.
def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']):
with open(filename, 'r') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
知道如何解决这个问题吗?
您没有指定以二进制模式打开文件,因此 f.read()
试图将文件作为 UTF-8 编码的文本文件读取,这似乎不起作用。但是由于我们采用的是 bytes 的散列,而不是 strings 的散列,因此编码是什么,甚至文件是否为文本都无关紧要完全:打开它,然后以二进制文件的形式读取它。
>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte
但是
>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325
您正在打开一个非 UTF-8 编码的文件,而您系统的默认编码设置为 UTF-8。
由于您计算的是 SHA1 哈希,因此您应该将数据读取为 binary。 hashlib
函数要求您以字节形式传递:
with open(filename, 'rb') as f:
return hashlib.sha1(f.read()).hexdigest() == sha1
注意在文件模式中添加b
mode is an optional string that specifies the mode in which the file is opened. It defaults to
'r'
which means open for reading in text mode. [...] In text mode, if encoding is not specified the encoding used is platform dependent:locale.getpreferredencoding(False)
is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)
和来自 hashlib
module documentation:
You can now feed this object with bytes-like objects (normally bytes) using the update() method.
由于文档和 src 代码中没有任何提示,我不知道为什么,但是使用 b char(我猜是二进制)完全有效(tf-version:1.1.0):
image_data = tf.gfile.FastGFile(filename, 'rb').read()