比较不同数组中的两个元素
Comparison of two elements in different array
我的问题:
我正在尝试比较来自两个不同数组的两个元素,但运算符不起作用。
有问题的代码段:
for i in range(row_length):
print(f"ss_record: {ss_record[i]}")
print(f"row: {row[i + 1]}")
#THIS IF STATEMENT IS NOT WORKING
if ss_record[i] == row[i + 1]:
count += 1
#print()
#print(f"row length: {row_length}")
#print(f"count: {count}")
if count == row_length:
print(row[0])
exit(0)
我做了什么: 我试图在运行 if
语句之前打印 ss_record
和 row
的值,但是当它匹配时,count
不会增加。我尝试将 row
的值存储在一个新数组中,但它出错了,只存储数组长度和行的前 2 个值,并在每个下一个实例中重复这些值。
我认为的问题: 我认为我的代码的问题是正在从 CSV
文件中读取行,但未将其转换为整数结果,看起来它们是相同的,但一个是整数而另一个是字符串。
完整代码:
import csv
import sys
import re
from cs50 import get_string
from sys import argv
def main():
line_count = 0
if len(argv) != 3:
print("missing command-line argument")
exit(1)
with open(sys.argv[1], 'r') as database:
sequence = open(sys.argv[2], 'r')
string = sequence.read()
reader = csv.reader(database, delimiter = ',')
for row in reader:
if line_count == 0:
row_length = len(row) - 1
ss_record = [row_length]
for i in range(row_length):
ss_record.append(ss_count(string, row[i + 1], len(row[i + 1])))
ss_record.pop(0)
line_count = 1
else:
count = 0
for i in range(row_length):
print(f"ss_record: {ss_record[i]}")
print(f"row: {row[i + 1]}")
#THIS IF STATEMENT IS NOT WORKING
if ss_record[i] == row[i + 1]:
count += 1
if count == row_length:
print(row[0])
exit(0)
#ss_count mean the # of times the substring appear in the string
def ss_count(string, substring, length):
count = 1
record = 0
pos_array = []
for m in re.finditer(substring, string):
pos_array.append(m.start())
for i in range(len(pos_array) - 1):
if pos_array[i + 1] - pos_array[i] == length:
count += 1
else:
if count > record:
record = count
count = 1
if count > record:
record = count
return record
main()
用于重现问题的值:
sequence (this is a text file) = AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG
substring (this is a csv file) =
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
CSV 文件要点:
Alice 旁边的数字表示一个子串(STR/Short Tandem Repeat)在字符串(DNA 序列)的一行中出现了多少次。在这个字符串中,AGATC连续出现4次,AATG连续出现1次,TATC连续出现5次。对于这个 DNA 序列,它与 Bob 匹配,他作为答案输出。
你是对的,当你比较ss_record[i] == row[i + 1]:
时有类型问题,ss_record的数字是整数,而行的数字是字符串。您可以通过同时打印 ss_record
和 row
:
来确认问题
print("ss_record: {}".format(ss_record)) -> ss_record: [4, 1, 5]
print("row: {}".format(row)) -> row: ['Alice', '2', '8', '3']
为了使代码段正常工作,您只需将比较更改为
ss_record[i] == int(row[i + 1])
也就是说,我觉得代码对于这项任务来说相当复杂。字符串 class 实现了 count
方法,该方法 returns 给定子字符串出现 non-overlapping 的次数。此外,由于代码以项目为基础工作并且在很大程度上依赖于索引操作,因此很难遵循迭代逻辑(IMO)。这是我解决问题的方法:
import csv
def match_user(dna_file, user_csv):
with open(dna_file, 'r') as r:
dna_seq = r.readline()
with open(user_csv, 'r') as r:
reader = csv.reader(r)
rows = list(reader)
target_substrings = rows[0][1:]
users = rows[1:]
num_matches = [dna_seq.count(target) for target in target_substrings]
for user in users:
user_matches = [int(x) for x in user[1:]]
if user_matches == num_matches:
return user[0]
return "Not found"
编码愉快!
我的问题: 我正在尝试比较来自两个不同数组的两个元素,但运算符不起作用。
有问题的代码段:
for i in range(row_length):
print(f"ss_record: {ss_record[i]}")
print(f"row: {row[i + 1]}")
#THIS IF STATEMENT IS NOT WORKING
if ss_record[i] == row[i + 1]:
count += 1
#print()
#print(f"row length: {row_length}")
#print(f"count: {count}")
if count == row_length:
print(row[0])
exit(0)
我做了什么: 我试图在运行 if
语句之前打印 ss_record
和 row
的值,但是当它匹配时,count
不会增加。我尝试将 row
的值存储在一个新数组中,但它出错了,只存储数组长度和行的前 2 个值,并在每个下一个实例中重复这些值。
我认为的问题: 我认为我的代码的问题是正在从 CSV
文件中读取行,但未将其转换为整数结果,看起来它们是相同的,但一个是整数而另一个是字符串。
完整代码:
import csv
import sys
import re
from cs50 import get_string
from sys import argv
def main():
line_count = 0
if len(argv) != 3:
print("missing command-line argument")
exit(1)
with open(sys.argv[1], 'r') as database:
sequence = open(sys.argv[2], 'r')
string = sequence.read()
reader = csv.reader(database, delimiter = ',')
for row in reader:
if line_count == 0:
row_length = len(row) - 1
ss_record = [row_length]
for i in range(row_length):
ss_record.append(ss_count(string, row[i + 1], len(row[i + 1])))
ss_record.pop(0)
line_count = 1
else:
count = 0
for i in range(row_length):
print(f"ss_record: {ss_record[i]}")
print(f"row: {row[i + 1]}")
#THIS IF STATEMENT IS NOT WORKING
if ss_record[i] == row[i + 1]:
count += 1
if count == row_length:
print(row[0])
exit(0)
#ss_count mean the # of times the substring appear in the string
def ss_count(string, substring, length):
count = 1
record = 0
pos_array = []
for m in re.finditer(substring, string):
pos_array.append(m.start())
for i in range(len(pos_array) - 1):
if pos_array[i + 1] - pos_array[i] == length:
count += 1
else:
if count > record:
record = count
count = 1
if count > record:
record = count
return record
main()
用于重现问题的值:
sequence (this is a text file) = AAGGTAAGTTTAGAATATAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG
substring (this is a csv file) =
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
CSV 文件要点: Alice 旁边的数字表示一个子串(STR/Short Tandem Repeat)在字符串(DNA 序列)的一行中出现了多少次。在这个字符串中,AGATC连续出现4次,AATG连续出现1次,TATC连续出现5次。对于这个 DNA 序列,它与 Bob 匹配,他作为答案输出。
你是对的,当你比较ss_record[i] == row[i + 1]:
时有类型问题,ss_record的数字是整数,而行的数字是字符串。您可以通过同时打印 ss_record
和 row
:
print("ss_record: {}".format(ss_record)) -> ss_record: [4, 1, 5]
print("row: {}".format(row)) -> row: ['Alice', '2', '8', '3']
为了使代码段正常工作,您只需将比较更改为
ss_record[i] == int(row[i + 1])
也就是说,我觉得代码对于这项任务来说相当复杂。字符串 class 实现了 count
方法,该方法 returns 给定子字符串出现 non-overlapping 的次数。此外,由于代码以项目为基础工作并且在很大程度上依赖于索引操作,因此很难遵循迭代逻辑(IMO)。这是我解决问题的方法:
import csv
def match_user(dna_file, user_csv):
with open(dna_file, 'r') as r:
dna_seq = r.readline()
with open(user_csv, 'r') as r:
reader = csv.reader(r)
rows = list(reader)
target_substrings = rows[0][1:]
users = rows[1:]
num_matches = [dna_seq.count(target) for target in target_substrings]
for user in users:
user_matches = [int(x) for x in user[1:]]
if user_matches == num_matches:
return user[0]
return "Not found"
编码愉快!