使用 python 中的单字节分隔符将二进制文件内容分成两部分

Question

我有一个由三部分组成的文件：

Xml header (unicode);
ASCII字符29（组分隔符）；
文件末尾的数字流

我想从第一部分获取一个 xml 字符串，以及数字流（用 struct.unpack 或 array.fromfile 解析）。

我是否应该创建一个空字符串并将其添加到其中逐字节读取文件直到找到分隔符，如图所示here？

或者有没有一种方法可以读取所有内容并使用类似 xmlstring = open('file.dat', 'rb').read().split(chr(29))[0] 的方法（顺便说一下，这不起作用）？

编辑：这是我使用十六进制编辑器看到的：分隔符在那里（选定字节）

Answer 1

确保在尝试拆分文件之前正在读入文件。在您的代码中，您没有 .read()

with open('file.dat', 'rb') as f:
    file = f.read()
    if chr(29) in file:
        xmlstring = file.split(chr(29))[0]
    elif hex(29) in file:
        xmlstring = file.split(hex(29))[0]
    else:
        xmlstring = '\x1d not found!'

确保您的文件中存在 ASCII 29 字符 (\x1d)

Answer 2

您尝试搜索值 chr(29) 没有成功，因为在该表达式中 29 是一个十进制表示法的值。然而，您从十六进制编辑器中获得的值以十六进制显示，因此它是 0x29（或十进制的 41）。

您可以简单地在 Python 中进行转换 - 0xnn 只是输入整数文字的另一种表示法：

>>> 0x29
41

然后您可以使用 str.partition 将数据拆分为各自的部分：

with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

演示:

import random

SEP = chr(0x29)


with open('file.dat', 'wb') as outfile:
    outfile.write("<doc></doc>")
    outfile.write(SEP)
    data = ''.join(chr(random.randint(0, 255)) for i in range(1024))
    outfile.write(data)


with open('file.dat', 'rb') as infile:
    data = infile.read()

xml, sep, binary_data = data.partition(SEP)

print xml
print len(binary_data)

输出：

<doc></doc>
1024

Answer 3

mmap 文件，搜索 29，从第一部分创建 buffer 或 memoryview 以提供给解析器，其余部分通过 struct.

使用 python 中的单字节分隔符将二进制文件内容分成两部分

Splitting binary file content in two parts using single byte separator in python

python

string

split

file

stream