关于 CS50 Pset6 DNA,它高估了 large.cvs 的 STR
About CS50 Pset6 DNA, it overcounts STR for large.cvs
我正在研究 pset6,DNA 问题。此代码适用于 small.cvs,但当我尝试使用较大的代码时,它高估了 STR 计数。我想问题出在它试图比较字符串时。但仍然不知道如何修复它。我检查了 "TTTTTTCT" 序列的计数是否正确,但对于剩余的 STR,计数在所有情况下都大于应有的值。
import sys
import csv
def main():
while (len(sys.argv) != 3):
print ("ERROR. Usage: python dna.py data.csv sequence.txt")
break
list_str = {}
#load the STRs to analyse
with open(sys.argv[1]) as csvfile:
readcsv = csv.reader (csvfile)
ncol = len(next(readcsv))
csvfile.seek(0)
header = list()
for line in readcsv:
a = sum(1 for line in readcsv)
for i in range(ncol):
list_str[line[i]] = 0
header.insert (i, line [i])
print (f"{header[i]}")
#open an work with the sequence file
sequence = open(sys.argv[2], 'r')
seq_r = sequence.read()
for k in list_str.keys():
#print (f"keu {k}")
p = 0
seq = len(seq_r)
while p < seq:
if seq_r[p:(p + len(k))] == k:
list_str[k] += 1
p += len(k)
else: p += 1
#print (f" sequenci encontrada{list_str[k]} y {k}")
print (f"nro de {k} {list_str[k]}")
with open(sys.argv[1]) as csvfile:
readcsv = csv.reader (csvfile)
next(csvfile)
find = False
for row in readcsv:
for j in range(1,ncol):
#print(f"header :{header[j]}")
if int(row [j]) == int(list_str[header[j]]):
print (f"row {row[j]} list {list_str[header[j]]}")
find = True
else:
find = False
break
if find == True: print (f"{row [0]}")
main()
我也遇到了同样的情况,然后看到了pset的规格书。
我们需要找到“STR”的“最长运行 连续重复”。不是 STR 的总数。它也适用于 small.csv,就像我的情况一样,因此请尝试搜索特定 STR 的最长连续出现次数。
我正在研究 pset6,DNA 问题。此代码适用于 small.cvs,但当我尝试使用较大的代码时,它高估了 STR 计数。我想问题出在它试图比较字符串时。但仍然不知道如何修复它。我检查了 "TTTTTTCT" 序列的计数是否正确,但对于剩余的 STR,计数在所有情况下都大于应有的值。
import sys
import csv
def main():
while (len(sys.argv) != 3):
print ("ERROR. Usage: python dna.py data.csv sequence.txt")
break
list_str = {}
#load the STRs to analyse
with open(sys.argv[1]) as csvfile:
readcsv = csv.reader (csvfile)
ncol = len(next(readcsv))
csvfile.seek(0)
header = list()
for line in readcsv:
a = sum(1 for line in readcsv)
for i in range(ncol):
list_str[line[i]] = 0
header.insert (i, line [i])
print (f"{header[i]}")
#open an work with the sequence file
sequence = open(sys.argv[2], 'r')
seq_r = sequence.read()
for k in list_str.keys():
#print (f"keu {k}")
p = 0
seq = len(seq_r)
while p < seq:
if seq_r[p:(p + len(k))] == k:
list_str[k] += 1
p += len(k)
else: p += 1
#print (f" sequenci encontrada{list_str[k]} y {k}")
print (f"nro de {k} {list_str[k]}")
with open(sys.argv[1]) as csvfile:
readcsv = csv.reader (csvfile)
next(csvfile)
find = False
for row in readcsv:
for j in range(1,ncol):
#print(f"header :{header[j]}")
if int(row [j]) == int(list_str[header[j]]):
print (f"row {row[j]} list {list_str[header[j]]}")
find = True
else:
find = False
break
if find == True: print (f"{row [0]}")
main()
我也遇到了同样的情况,然后看到了pset的规格书。
我们需要找到“STR”的“最长运行 连续重复”。不是 STR 的总数。它也适用于 small.csv,就像我的情况一样,因此请尝试搜索特定 STR 的最长连续出现次数。