使用来自具有相同名称但不同扩展名的不同文件的数据来获取行号

Question

我使用以下代码：

 from collections import defaultdict
 import sys
 import os
 for doc in   os.listdir('path1'):
doc1 = "path1" + doc
doc2 = "path2" + doc

doc3 = "path3" + doc
with open(doc1,"r") as words:
    sent = words.read().split()
        print sent
    linenos = {}

    with open(doc2, "r") as f1:
            for i, line in enumerate(f1):
                for word in sent:
                        if word in line:
                            if word in linenos:
                                    linenos[word].append(i + 1)
                            else:
                                    linenos[word] = [i + 1]

    matched2 = []
    for word in sent:
            if word in linenos:
                matched2.append('%s %r' % (word, linenos[word][0]))
            else:
                matched2.append('%s <does not exist>' % word)
    with open(doc3,"w") as f1:
        f1.write( ', '.join(matched2))

因此，我的路径 1 包含 file1.title、file2.title 等文件...直到 file240.title

同样，我有 path2，其中包含 file1.txt、file2.txt 等文件......直到 tile240.txt

例如：

file1.title 将有如下数据：

military  troop deployment number need

file1.txt 将有：

foreign 1242
military 23020
firing  03848
troop 2939
number 0032
dog 1234
cat 12030
need w1212

输出：

path3/file1.txt

military 2, troop 4, deployment <does not exist>, number 5, need 8

基本上，代码获取 file1.txt 中出现的单词的行号，并且这些单词是从 file1.title 输入的。它适用于单个文件，例如一次输入单个文件。但我需要为装满文档的文件夹完成此操作。

也就是说，它应该从 file1.title 中读取单词并从 file1.txt 中获取单词的行号，类似地，从 file2.title 中读取单词作为字符串并获取行号来自 file2.txt 等的那些词..

问题是，我无法使用此代码读取具有不同扩展名的相同文件。我应该如何修改它以获得适当的输出？

Answer 1

我想你只需要在open(docx, 'w') 上写上文件的全名。例如，将 doc1 替换为 'file1.title'，将 doc2 替换为 'file1.txt'，我不知道这是否是您正在做的，但是当您调用文件时扩展名很重要。

Answer 2

我猜你要求替换文件名字符串中的扩展名，如下所示：

doc2 = "path2" + doc[:-6] + ".txt"

这从 doc 中删除了 6 个字符“.title”并添加了扩展名“.txt”。

Answer 3

你想做这样的事情吗？

import os

for name in set([fname.split('.')[0] for fname in os.listdir('.') if fname.split('.')[1] in ['txt', 'title']]):
    f1 = open(''.join([name, '.txt'])).read()
    f2 = open(''.join([name, '.title'])).read()
    # Do whatever with the file contents

使用来自具有相同名称但不同扩展名的不同文件的数据来获取行号

Use data from different files with same names but different extensions to get the line numbers

python

extract

line

python-2.7