Python 链式区间比较
Python chained interval comparison
我正在尝试对两个文件进行链式比较,如果结果在指定的时间间隔内,printing/writing 就会得出结果。
这是我目前所拥有的。
test1 文件:
A0AUZ9,7,17 #just this one line
测试 2 文件:
A0AUZ8, DOC_PP1_RVXF_1, 8, 16, PF00149, O24930
A0AUZ9, LIG_BRCT_BRCA1_2, 127, 134, PF00533, O25336
A0AUZ9, LIG_BRCT_BRCA1_1, 127, 132, PF00533, O25336
A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25685
A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25155
以及脚本本身:
results = []
with open('test1', 'r') as disorder:
for lines in disorder:
cells = lines.strip().split(',')
with open('test2', 'r') as helpy:
for lines in helpy:
blocks = lines.strip().split(',')
if blocks[0] != cells[0]:
continue
elif cells[1] <= blocks[2] and blocks[3] <= cells[2]:
results.append(blocks)
with open('test3','wt') as outfile:
for i in results:
outfile.write("%s\n" % i)
我的首选输出是只有 test3 中的行,即:
第一列中有匹配的 ID
第3列和第4列的两个数值介于test1文件中给出的值之间
我没有得到输出,我不确定哪里出了问题。
它没有按预期工作的原因之一是您比较的是 字符串 而不是数字。
但是,可能有更好的方法来完成您想要做的事情。假设第一个文件小到可以放入内存:
import csv
from collections import defaultdict
lookup_table = defaultdict(list)
with open('test1.txt') as f:
reader = csv.reader(f)
for row in reader:
lookup_table[row[0]].append((int(row[1]),int(row[2])))
with open('test2.txt') as a, open('results.txt', 'w') as b:
reader = csv.reader(a)
writer = csv.writer(b)
for row in reader:
record = lookup_table.get(row[0])
if record:
if record[0] <= int(row[2]) and record[1] <= int(row[3]):
writer.writerow(row)
我正在尝试对两个文件进行链式比较,如果结果在指定的时间间隔内,printing/writing 就会得出结果。
这是我目前所拥有的。
test1 文件:
A0AUZ9,7,17 #just this one line
测试 2 文件:
A0AUZ8, DOC_PP1_RVXF_1, 8, 16, PF00149, O24930
A0AUZ9, LIG_BRCT_BRCA1_2, 127, 134, PF00533, O25336
A0AUZ9, LIG_BRCT_BRCA1_1, 127, 132, PF00533, O25336
A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25685
A0AUZ9, DOC_PP1_RVXF_1, 8, 16, PF00149, O25155
以及脚本本身:
results = []
with open('test1', 'r') as disorder:
for lines in disorder:
cells = lines.strip().split(',')
with open('test2', 'r') as helpy:
for lines in helpy:
blocks = lines.strip().split(',')
if blocks[0] != cells[0]:
continue
elif cells[1] <= blocks[2] and blocks[3] <= cells[2]:
results.append(blocks)
with open('test3','wt') as outfile:
for i in results:
outfile.write("%s\n" % i)
我的首选输出是只有 test3 中的行,即:
第一列中有匹配的 ID
第3列和第4列的两个数值介于test1文件中给出的值之间
我没有得到输出,我不确定哪里出了问题。
它没有按预期工作的原因之一是您比较的是 字符串 而不是数字。
但是,可能有更好的方法来完成您想要做的事情。假设第一个文件小到可以放入内存:
import csv
from collections import defaultdict
lookup_table = defaultdict(list)
with open('test1.txt') as f:
reader = csv.reader(f)
for row in reader:
lookup_table[row[0]].append((int(row[1]),int(row[2])))
with open('test2.txt') as a, open('results.txt', 'w') as b:
reader = csv.reader(a)
writer = csv.writer(b)
for row in reader:
record = lookup_table.get(row[0])
if record:
if record[0] <= int(row[2]) and record[1] <= int(row[3]):
writer.writerow(row)