比较不同数组中的两个元素

Comparison of two elements in different array

我的问题: 我正在尝试比较来自两个不同数组的两个元素,但运算符不起作用。

有问题的代码段:

for i in range(row_length):
    print(f"ss_record: {ss_record[i]}")
    print(f"row: {row[i + 1]}")
                
    #THIS IF STATEMENT IS NOT WORKING
    if ss_record[i] == row[i + 1]:
        count += 1
    #print()
    #print(f"row length: {row_length}")
    #print(f"count: {count}")
    if count == row_length:
        print(row[0])
        exit(0)

我做了什么: 我试图在运行 if 语句之前打印 ss_recordrow 的值,但是当它匹配时,count 不会增加。我尝试将 row 的值存储在一个新数组中,但它出错了,只存储数组长度和行的前 2 个值,并在每个下一个实例中重复这些值。

我认为的问题: 我认为我的代码的问题是正在从 CSV 文件中读取行,但未将其转换为整数结果,看起来它们是相同的,但一个是整数而另一个是字符串。

完整代码:

import csv
import sys
import re
from cs50 import get_string
from sys import argv

def main():
    line_count = 0
    if len(argv) != 3:
        print("missing command-line argument")
        exit(1)
    
    with open(sys.argv[1], 'r') as database:
        sequence = open(sys.argv[2], 'r')
        string = sequence.read()
        reader = csv.reader(database, delimiter = ',')

        for row in reader:
            if line_count == 0:
                row_length = len(row) - 1
                ss_record = [row_length]
                for i in range(row_length):
                    ss_record.append(ss_count(string, row[i + 1], len(row[i + 1])))
        
                ss_record.pop(0)
                line_count = 1
            
            else:
                count = 0
                for i in range(row_length):
                    print(f"ss_record: {ss_record[i]}")
                    print(f"row: {row[i + 1]}")
                    
                    #THIS IF STATEMENT IS NOT WORKING
                    if ss_record[i] == row[i + 1]:
                        count += 1
                if count == row_length:
                    print(row[0])
                    exit(0)
  
 
#ss_count mean the # of times the substring appear in the string
def ss_count(string, substring, length):
    count = 1
    record = 0
    pos_array = []

    for m in re.finditer(substring, string):
        pos_array.append(m.start())
    
    for i in range(len(pos_array) - 1):
        if pos_array[i + 1] - pos_array[i] == length:
                count += 1
        else:
            if count > record:   
                record = count
            count = 1
    
    if count > record:   
        record = count
    
    return record
main()

用于重现问题的值:

sequence (this is a text file) = AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG

substring (this is a csv file) =
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5

CSV 文件要点: Alice 旁边的数字表示一个子串(STR/Short Tandem Repeat)在字符串(DNA 序列)的一行中出现了多少次。在这个字符串中,AGATC连续出现4次,AATG连续出现1次,TATC连续出现5次。对于这个 DNA 序列,它与 Bob 匹配,他作为答案输出。

你是对的,当你比较ss_record[i] == row[i + 1]:时有类型问题,ss_record的数字是整数,而行的数字是字符串。您可以通过同时打印 ss_recordrow:

来确认问题
print("ss_record: {}".format(ss_record)) -> ss_record: [4, 1, 5]
print("row: {}".format(row)) -> row: ['Alice', '2', '8', '3']

为了使代码段正常工作,您只需将比较更改为

ss_record[i] == int(row[i + 1])

也就是说,我觉得代码对于这项任务来说相当复杂。字符串 class 实现了 count 方法,该方法 returns 给定子字符串出现 non-overlapping 的次数。此外,由于代码以项目为基础工作并且在很大程度上依赖于索引操作,因此很难遵循迭代逻辑(IMO)。这是我解决问题的方法:

import csv

def match_user(dna_file, user_csv):
    with open(dna_file, 'r') as r:
        dna_seq = r.readline()

    with open(user_csv, 'r') as r:
        reader = csv.reader(r)
        rows = list(reader)

    target_substrings = rows[0][1:]
    users = rows[1:]

    num_matches = [dna_seq.count(target) for target in target_substrings]
    for user in users:
        user_matches = [int(x) for x in user[1:]]
        if user_matches == num_matches:
            return user[0]

    return "Not found"

编码愉快!