解码编码文本文件 - Python

Question

所以假设我写了一个方法，将文本文件编码成一些看起来像

的乱码

úÎúÞ<81>i<82>ran<81><83>there<81><84>with<85>carol<86>we<81><87>did

我对如何将其放回普通文本文件的感觉为 0 i ran there with carol we did

开头的字符只是幻数，但我只想检查幻数，然后忽略数字将单词放回文件中。

Answer 1

使用re提取><:

之间的单词

s = "úÎúÞ<81>i<82>ran<81><83>there<81><84>with<85>carol<86>we<81><87>did"

import re
r = re.compile(">(.*?)<|>(.*)")
print(r.findall(s))

如果最后一个字不是包裹在><使用：

print(" ".join(("".join(x) for x in r.findall(s))))

Answer 2

re.split 使用正确的模式即可：

import re
s='úÎúÞ<81>i<82>ran<81><83>there<81><84>with<85>carol<86>we<81><87>did'
L = re.split(r'<[\d<>]+>',s)
print(L)
print(' '.join(L[1:]))

输出：

['úÎúÞ', 'i', 'ran', 'there', 'with', 'carol', 'we', 'did']
i ran there with carol we did

Decoding an encoded text file - Python