读取一个文件中的行并查找另一个 txt 文件中列出的所有以 4 字母字符串开头的字符串
Read lines in one file and find all strings starting with 4-letter strings listed in another txt file
我有 2 个 txt 文件(a 和 b_)。
file_a.txt 包含一长串 4 字母组合(每行一个组合):
aaaa
bcsg
aacd
gdee
aadw
hwer
etc.
file_b.txt 包含各种长度的字母组合列表(有些带有空格):
aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.
我正在寻找允许我执行以下操作的 python 脚本:
- 逐行阅读file_a.txt
- 取每个 4 字母组合(例如 aaai)
- 阅读file_b.txt并找到所有以4个字母组合开头的各种长度的字母组合(例如aaaibjkes,aaailoiersaaaageehikjaaa, aaailoiuwegoiglkjaaaike 等)
- 将每次搜索的结果打印在一个单独的 txt 文件中,该文件以 4 个字母的组合命名。
文件aaai.txt:
aaaibjkes
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.
文件bcsi.txt:
bcspwiopiejowih
bcsiweyoieotpwe
etc.
对不起,我是新手。请有人指出我正确的方向。到目前为止我只有:
#I presume I will have to use regex at some point
import re
file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()
#Should I look into findall()?
您可以使用 for
循环逐行打开这两个文件和 运行 这两个文件。
你可以有两个 for
循环,第一个循环读 file_a.txt
因为你只会读一遍。第二个将通读 file_b.txt
并在开头查找字符串。
为此,您必须使用 .find()
来搜索字符串。因为是开头,所以值应该是0
.
file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")
for a_line in file_a:
# This result value will be written into your new file
result = ""
# This is what we will search with
search_val = a_line.strip("\n")
print "---- Using " + search_val + " from file_a to search. ----"
for b_line in file_b:
print "Searching file_b using " + b_line.strip("\n")
if b_line.strip("\n").find(search_val) == 0:
result += (b_line)
print "---- Search ended ----"
# Set the read pointer to the start of the file again
file_b.seek(0, 0)
if result:
# Write the contents of "results" into a file with the name of "search_val"
with open(search_val + ".txt", "a") as f:
f.write(result)
file_a.close()
file_b.close()
测试用例:
我正在使用你问题中的测试用例:
file_a.txt
aaaa
bcsg
aacd
gdee
aadw
hwer
file_b.txt
aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
程序生成一个输出文件 bcsg.txt
,因为它应该包含 bcsgiweyoieotpwe
。
希望对您有所帮助;
file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')
#get every item in your second file into a list
mylist = file2.readlines()
# read each line in the first file
while file1.readline():
searchStr = file1.readline()
# find this line in your second file
exists = [s for s in mylist if searchStr in s]
if (exists):
# if this line exists in your second file then create a file for it
fileNew = open(searchStr,'w')
for line in exists:
fileNew.write(line)
fileNew.close()
file1.close()
试试这个:
f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]
data = []
data_dict ={}
for short_word in file1:
data += ([[short_word,w] for w in file2 if w.startswith(short_word)])
for single_data in data:
if single_data[0] in data_dict:
data_dict[single_data[0]].append(single_data[1])
else:
data_dict[single_data[0]]=[single_data[1]]
for key,val in data_dict.iteritems():
open(key+".txt","w").writelines("\n".join(val))
print(key + ".txt created")
我有 2 个 txt 文件(a 和 b_)。
file_a.txt 包含一长串 4 字母组合(每行一个组合):
aaaa
bcsg
aacd
gdee
aadw
hwer
etc.
file_b.txt 包含各种长度的字母组合列表(有些带有空格):
aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
etc.
我正在寻找允许我执行以下操作的 python 脚本:
- 逐行阅读file_a.txt
- 取每个 4 字母组合(例如 aaai)
- 阅读file_b.txt并找到所有以4个字母组合开头的各种长度的字母组合(例如aaaibjkes,aaailoiersaaaageehikjaaa, aaailoiuwegoiglkjaaaike 等)
- 将每次搜索的结果打印在一个单独的 txt 文件中,该文件以 4 个字母的组合命名。
文件aaai.txt:
aaaibjkes
aaailoiersaaageehikjaaa
aaailoiuwegoiglkjaaake
etc.
文件bcsi.txt:
bcspwiopiejowih
bcsiweyoieotpwe
etc.
对不起,我是新手。请有人指出我正确的方向。到目前为止我只有:
#I presume I will have to use regex at some point
import re
file1 = open('file_a.txt', 'r').readlines()
file2 = open('file_b.txt', 'r').readlines()
#Should I look into findall()?
您可以使用 for
循环逐行打开这两个文件和 运行 这两个文件。
你可以有两个 for
循环,第一个循环读 file_a.txt
因为你只会读一遍。第二个将通读 file_b.txt
并在开头查找字符串。
为此,您必须使用 .find()
来搜索字符串。因为是开头,所以值应该是0
.
file_a = open("file_a.txt", "r")
file_b = open("file_b.txt", "r")
for a_line in file_a:
# This result value will be written into your new file
result = ""
# This is what we will search with
search_val = a_line.strip("\n")
print "---- Using " + search_val + " from file_a to search. ----"
for b_line in file_b:
print "Searching file_b using " + b_line.strip("\n")
if b_line.strip("\n").find(search_val) == 0:
result += (b_line)
print "---- Search ended ----"
# Set the read pointer to the start of the file again
file_b.seek(0, 0)
if result:
# Write the contents of "results" into a file with the name of "search_val"
with open(search_val + ".txt", "a") as f:
f.write(result)
file_a.close()
file_b.close()
测试用例:
我正在使用你问题中的测试用例:
file_a.txt
aaaa
bcsg
aacd
gdee
aadw
hwer
file_b.txt
aaaibjkes
aaleoslk
abaaaalkjel
bcsgiweyoieotpwe
csseiolskj
gaelsi asdas
aaaloiersaaageehikjaaa
hwesdaaadf wiibhuehu
bcspwiopiejowih
gdeaes
aaailoiuwegoiglkjaaake
程序生成一个输出文件 bcsg.txt
,因为它应该包含 bcsgiweyoieotpwe
。
希望对您有所帮助;
file1 = open('file_a.txt', 'r')
file2 = open('file_b.txt', 'r')
#get every item in your second file into a list
mylist = file2.readlines()
# read each line in the first file
while file1.readline():
searchStr = file1.readline()
# find this line in your second file
exists = [s for s in mylist if searchStr in s]
if (exists):
# if this line exists in your second file then create a file for it
fileNew = open(searchStr,'w')
for line in exists:
fileNew.write(line)
fileNew.close()
file1.close()
试试这个:
f1 = open("a.txt","r").readlines()
f2 = open("b.txt","r").readlines()
file1 = [word.replace("\n","") for word in f1]
file2 = [word.replace("\n","") for word in f2]
data = []
data_dict ={}
for short_word in file1:
data += ([[short_word,w] for w in file2 if w.startswith(short_word)])
for single_data in data:
if single_data[0] in data_dict:
data_dict[single_data[0]].append(single_data[1])
else:
data_dict[single_data[0]]=[single_data[1]]
for key,val in data_dict.iteritems():
open(key+".txt","w").writelines("\n".join(val))
print(key + ".txt created")