除了计数之外,还有其他方法可以计算字符串中的重复次数吗?
Is there a different way to count repetitions in a string except count?
我正在做 CS50 DNA 题,python 计数函数不断返回值,我不确定为什么。我尝试使用查找,但我的实现是错误的
import csv, sys
#check if all arguments are provided
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
exit()
#sets database to first argument
databaseFile = sys.argv[1]
#sequence is second file
sequenceFile = sys.argv[2]
#make list for data
data = []
database = []
#open csv file
with open(databaseFile, 'r') as csvfile:
#make reader
csvreader = csv.reader(csvfile)
#read in the headers
fields = next(csvreader)
fields.remove("name")
#read in the rows of data in database
for row in csvreader:
#add data, with names and database with just numbers
data.append(row)
database.append([int(i) for i in row[1:]])
#open sequence
sequence = open(sequenceFile, 'r').readline()
results = []
#for add repetitions to results
for field in fields:
results.append(sequence.count(field))
print(results)
found = False
for i in database:
if (results == i):
print(data[i])
found = True
if not found:
print("No match")
还有什么方法可以按顺序计算'field'的重复次数
例如当我加载时:
python dna.py databases/large.csv sequences/19.txt
我得到的结果是:
[47, 40, 34, 11, 24, 31, 60, 26]
No match
而不是弗雷德
即:[37, 40, 10, 6, 5, 10, 28, 8]
可以在以下位置找到问题:https://cs50.harvard.edu/x/2020/psets/6/dna/
使用正则表达式查找连续的匹配项。
import re
s = 'AGTCAGTCAGTCTTTTAGCTAGTC'
STR = 'AGTC'
strands = re.findall(f'(?:{STR})+', s)
print(strands) # prints `['AGTCAGTCAGTC', 'AGTC']`
my_max = max(map(len, strands))//len(STR)
print(my_max)
打印 3
这是正确的,因为它连续出现 3 次(并且在序列末尾再次出现)。
我希望这段代码可以帮助您解决这部分问题。
我正在做 CS50 DNA 题,python 计数函数不断返回值,我不确定为什么。我尝试使用查找,但我的实现是错误的
import csv, sys
#check if all arguments are provided
if len(sys.argv) != 3:
print("Usage: python dna.py data.csv sequence.txt")
exit()
#sets database to first argument
databaseFile = sys.argv[1]
#sequence is second file
sequenceFile = sys.argv[2]
#make list for data
data = []
database = []
#open csv file
with open(databaseFile, 'r') as csvfile:
#make reader
csvreader = csv.reader(csvfile)
#read in the headers
fields = next(csvreader)
fields.remove("name")
#read in the rows of data in database
for row in csvreader:
#add data, with names and database with just numbers
data.append(row)
database.append([int(i) for i in row[1:]])
#open sequence
sequence = open(sequenceFile, 'r').readline()
results = []
#for add repetitions to results
for field in fields:
results.append(sequence.count(field))
print(results)
found = False
for i in database:
if (results == i):
print(data[i])
found = True
if not found:
print("No match")
还有什么方法可以按顺序计算'field'的重复次数 例如当我加载时:
python dna.py databases/large.csv sequences/19.txt
我得到的结果是:
[47, 40, 34, 11, 24, 31, 60, 26]
No match
而不是弗雷德
即:[37, 40, 10, 6, 5, 10, 28, 8]
可以在以下位置找到问题:https://cs50.harvard.edu/x/2020/psets/6/dna/
使用正则表达式查找连续的匹配项。
import re
s = 'AGTCAGTCAGTCTTTTAGCTAGTC'
STR = 'AGTC'
strands = re.findall(f'(?:{STR})+', s)
print(strands) # prints `['AGTCAGTCAGTC', 'AGTC']`
my_max = max(map(len, strands))//len(STR)
print(my_max)
打印 3
这是正确的,因为它连续出现 3 次(并且在序列末尾再次出现)。
我希望这段代码可以帮助您解决这部分问题。