使用 python 按字节读取 MNIST

Question

我在玩弄MNIST数据集，遇到了以下问题，不太明白。根据 documentation数据格式如下：

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000801(2049) magic number (MSB first) 
0004     32 bit integer  60000            number of items 
0008     unsigned byte   ??               label 
0009     unsigned byte   ??               label 
........ 
xxxx     unsigned byte   ??               label
The labels values are 0 to 9.

因此，我希望字节 4-8 对应于项目数 (60,000) 为：

struct.pack('i', 60000)
>> '`\xea\x00\x00'

但是，当我逐字节读取文件时，看起来它们的顺序是相反的：

with gzip.open(path_to_file, 'rb') as f:
    print struct.unpack('cccc', f.read(4))
    for i in range(4):
        print struct.unpack('c', f.read(1))
>> ('\x00', '\x00', '\x08', '\x01')
>> ('\x00', '\x00', '\xea', '`')

显然，我可以颠倒它们以获得预期的顺序，但我对为什么字节似乎颠倒了感到困惑。

Answer 1

这是一个字内字节排序的产物。数据的格式为整数，因此您应该以这种方式读取它。这是 "little-endian" 寻址，最低（最早）地址具有最低有效字节。请注意，在第一个字段中，指定的格式为 "MSB first".

使用 python 按字节读取 MNIST

Reading MNIST by byte with python

python

mnist