关于 CS50 Pset6 DNA，它高估了 large.cvs 的 STR

Question

我正在研究 pset6，DNA 问题。此代码适用于 small.cvs，但当我尝试使用较大的代码时，它高估了 STR 计数。我想问题出在它试图比较字符串时。但仍然不知道如何修复它。我检查了 "TTTTTTCT" 序列的计数是否正确，但对于剩余的 STR，计数在所有情况下都大于应有的值。

import sys
import csv

def main():
    while (len(sys.argv) != 3):
        print ("ERROR. Usage: python dna.py data.csv sequence.txt")
        break

    list_str = {}

#load the STRs to analyse
    with open(sys.argv[1]) as csvfile:
        readcsv = csv.reader (csvfile)
        ncol = len(next(readcsv))
        csvfile.seek(0)
        header = list()

        for line in readcsv:
            a = sum(1 for line in readcsv)
        for i in range(ncol):
            list_str[line[i]] = 0
            header.insert (i, line [i])
            print (f"{header[i]}")

#open an work with the sequence file
    sequence = open(sys.argv[2], 'r')
    seq_r = sequence.read()

    for k in list_str.keys():
        #print (f"keu {k}")
        p = 0
        seq = len(seq_r)

        while p < seq:
            if seq_r[p:(p + len(k))] == k: 
                list_str[k] += 1
                p += len(k) 
            else: p += 1
                #print (f" sequenci encontrada{list_str[k]} y {k}")

        print (f"nro de {k} {list_str[k]}")

    with open(sys.argv[1]) as csvfile:
        readcsv = csv.reader (csvfile)
        next(csvfile)

        find = False

        for row in readcsv:
            for j in range(1,ncol):
                #print(f"header :{header[j]}")
                if int(row [j]) == int(list_str[header[j]]): 
                    print (f"row {row[j]} list {list_str[header[j]]}")
                    find = True
                else: 
                    find = False
                    break

            if find == True: print (f"{row [0]}")
main()

Answer 1

我也遇到了同样的情况，然后看到了pset的规格书。

我们需要找到“STR”的“最长运行连续重复”。不是 STR 的总数。它也适用于 small.csv，就像我的情况一样，因此请尝试搜索特定 STR 的最长连续出现次数。

关于 CS50 Pset6 DNA，它高估了 large.cvs 的 STR

About CS50 Pset6 DNA, it overcounts STR for large.cvs

python

string

cs50