将文本文件转换为字典时跳过行

Question

我有一个如下所示的文本文件：

word1   4
wöörd2   8
word3   12
word4   5
another word   1
many words one after another 1
word5   9

如果不是字多的行，下面的代码就可以了：

f = open("C:\path\words.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
    k, v = line.strip().split()
    dict[k.strip()] = v.strip()

f.close()

但显然在上述情况下我得到 ValueError: too many values to unpack (expected 2)。我假设有三个选项：

从文本文件中删除它，在一个巨大的文本文件中手动很难做到。
如果出现此类问题，请跳过该行。
修改代码，使值始终为最后一个数字。

我发现 3. 对于一个大的、多样化的（在字符和单词方面）文件来说太令人生畏了（特别是因为我不太关心有问题的行）。但是对于2.，分割线的时候如何判断元素是否超过2个？

Answer 1

如果你只是问第2点，你可以这样做：

f = open("C:\path\words.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
    if len(line.strip().split()) == 2:
        k, v = line.strip().split()
        dict[k.strip()] = v.strip()

f.close()

另外，如果您想知道 3，并且您知道最后一项始终是一个数字，您可以像这样索引数组以获取最后一个元素：

f = open("C:\path\words.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
    if len(line.strip().split()) == 2:
        k, v = line.strip().split()
        dict[k.strip()] = v.strip()[-1]
    
f.close()

Answer 2

从某种意义上说，您应该更改代码，从 line.strip().split() 开始，您不会返回键和值，而是返回一个列表。

f = open("C:\path\words.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
    splitted_line = line.strip().split()
    if len(splitted_line) <= 2:
        dict[splitted_line[0].strip()] = splitted_line[1].strip()
f.close()

现在，我要提到的是，如果您确实想要包括包含超过 1 个单词和数字的行，您可以通过将单词与特殊字符（如 _

连接起来来实现。

使用：

f = open("test.txt", 'r', encoding="utf-8")
dict = {}
for line in f:
    splitted_line = line.strip().split()
    if len(splitted_line) <= 2:
        dict[splitted_line[0].strip()] = splitted_line[1].strip()
    else:
        dict['_'.join(splitted_line[:-1])] = splitted_line[-1]
f.close()

Answer 3

这取决于你想做什么，但看起来你正在构建的字典总是以一行中的完整句子作为键，以行尾的数字作为值。如果数字始终是行中的最后一个元素，您可以这样做：

f = open("C:\path\words.txt", 'r', encoding="utf-8")
results = {}
for line in f:
    # select everything except for the last element, the sentence
    k = line[:-1].strip()
    # select just the last element, the number
    v = line[-1].strip()
    results[k] = v

f.close()

编辑：最好不要使用 dict 这个词，因为这是 python

中的一个方法

Answer 4

不需要检查。只需 捕获异常:

with open("C:\path\words.txt") as f:
    result = {}
    for line in f:
        try:
            k, v = line.split()
        except ValueError:
            pass
        else:
            result[k] = v

现在代码也适用于空行，或者单词之间没有空格的行。

请注意，我还做了一些更改：

使用with open(...) as f保证f将在块完成时关闭（无论发生什么）
不要使用名字dict；那就是您现在正在隐藏的内置类型。我改用 result。
当使用不带参数的str.split()时，不需要使用line.strip()、v.strip()或k.strip()；后者已经从每个拆分结果:
中删除了前导和尾随空格
>>> " str.strip() \t strips \f all whitespace \n".split() ['str.strip()', 'strips', 'all', 'whitespace']

您可以通过使用 dict.update() 接受一系列 (key, value) 元组这一事实使其更加简洁：

with open("C:\path\words.txt") as f: result = {} for line in f: try: result.update([line.split()]) except ValueError: pass

将文本文件转换为字典时跳过行

Skipping lines when turning a text file into a dictionary

python

text

dictionary