从多个文本文件构建字典映射

Question

我有多个带有 ID 和值的 *.txt 文件，我想创建一个唯一的字典。但是，某些 ID 在某些文件中重复，对于这些 ID，我想连接这些值。这是两个文件的示例（但我有一堆文件，所以我想我需要 glob.glob）：（注意某个文件中的所有 'values' 具有相同的长度，因此我可以添加'-' len(value) 丢失的次数。

文件 1

ID01
Hi 
ID02 
my 
ID03 
ni

文件 2

ID02 
name
ID04 
meet 
ID05 
your

Desire Output：（注意，当没有重复的ID时，我想添加'Na'或'-'，要连接相同的len（value））这是我的输出：

ID01 
Hi----
ID02 
myname
ID03 
ni----
ID04 
--meet
ID05 
--your

我只想将输出存储在字典中。另外，我想如果我在打开文件时打印文件，我可以知道哪些文件被打开的顺序，对吧？

这就是我所拥有的：（到目前为止我无法连接我的值）

output={}   
list = []   
for file in glob.glob('*.txt'):        
    FI = open(file,'r') 
    for line in FI.readlines():
        if (line[0]=='I'):      #I am interested in storing only the ones that start with I, for a future analysis. I know this can be done separating key and value with '\t'. Also, I am sure the next lines (values) does not start with 'I'
            ID = line.rstrip()
            output[ID] = ''
            if ID not in list:
                list.append(ID)     
        else:
            output[ID] = output[ID] + line.rstrip()

    if seqs_name in list:
        seqs[seqs_name] += seqs[seqs_name]

    print (file)
    FI.close()


print ('This is your final list: ')
print (list) #so far, I am getting the right final list, with no repetitive ID 
print (output) #PROBLEM: the repetitive ID, is being concatenated twice the 'value' in the last file read.

另外，ID不重复时如何加'-'？非常感谢您的帮助。

总结：当键在另一个文件中重复时，我无法连接值。如果密钥不重复，我想添加 '-' ，这样我可以稍后打印文件名并知道某个 ID 在哪个文件中没有值。

Answer 1

您现有代码的几个问题：

line[0] == 'ID': line[0] returns一个字符，所以这个比较总是假的。请改用 str.startswidth(xxx) 来检查字符串是否以 xxx.
您没有正确检索 ID 之后的文本。最简单的方法是调用 next(f).
您不需要第二个列表。另外，不要将变量命名为 list，因为它会遮盖内置变量。

import collections

output = collections.defaultdict(str)   
for file in glob.glob('*.txt'):        
    with open(file, 'r') as f: 
    for line in f:
        if line.startswith('ID'):   
            try: 
                text = next(f)
                output[line.strip()] += text.strip() + ' ' 
            except StopIteration:
                pass  

print(output)

使用 try-except.

捕获奇怪的异常永远不会有坏处

从多个文本文件构建字典映射

Build dict mapping from multiple text files

python

dictionary

glob

string-concatenation

text-files