在 FASTA 文件中查找长度为 18 的回文序列？

Question

我想设计向导 RNA 以在 FASTA 文件中查找回文序列。我想编写一个 python 脚本来查找整个序列中长度为 18 的所有回文序列。我心里有一个逻辑，但我不知道如何用 Python 字来表达。我的逻辑是：

1)If i is [ATCG] and i+17 is [TAGC] then check: 
2)if i+1 is [ATCG] and i+16 is [TAGC] then check: 
3)if i+2 is [ATCG] and i+15 is [TAGC] then check"
.
.
.

10)if i+9 is [ATCG] and i+10 is [TAGC] and all the above are true,

然后将i到i+17的序列识别为回文。但我需要确保对于 i 的 A，它只考虑 i+17 的 T。知道我如何在 python 中编写此逻辑吗？

谢谢，

Answer 1

所以你想匹配A+T和G+C。我们可以为此使用字典。然后我们只检查相对的边是否成对。

pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}
for i in range(len(sequence) - 18 + 1):
    pal = True
    for j in range(9):
        if pairs[ sequence[i+j] ] != sequence[i+17-j]:
            pal = False
            break
    if pal:
        print(sequence[i : i+18])

对于任何长度为 n 的回文（包括奇数 n）：

pairs = {"A":"T", "T":"A", "G":"C", "C":"G"}
n=18
for i in range(len(sequence) - n + 1):
    pal = True
    for j in range(n//2):
        if pairs[ sequence[i+j] ] != sequence[i-j+n-1]:
            pal = False
            break
    if pal:
        print(sequence[i : i+n])

Answer 2

逐个循环遍历字符串需要太多时间。 Python.

中的字符串处理效率更高

#create random test sequence
import random
random.seed(1234)
seq = "".join(random.choices(["A", "T", "C", "G"], k=99))
n = 4 #not exactly 18 but good enough as a test case
print(seq)
>>>GTAGGCCAGAAGTCCAAAATGACTCACTCCTTAGTCACAATTACACAGGGATATGAAGAGATTTGTGTGGTGGTAATACGTGCCTCGAGTAGCGTATAT

#dictionary because translation
bp = {"A":"T", "T":"A", "G":"C", "C":"G"}

#checks if first half translates into reversed second half
#returns False if not, e.g., if the length ls of s is not an even number
def palin(s):
    ls = len(s)
    if ls%2:
        return False        
    return s[:ls//2]=="".join([bp[i] for i in s[ls:ls//2-1:-1]])

#now to the actual test, checking all substrings of length n in our test sequence seq
#returns tuples of the index within seq and the found substring 
res = [(i, seq[i:i+n]) for i in range(len(seq)-n+1) if palin(seq[i:i+n])]
print(res)
>>>[(3, 'GGCC'), (38, 'AATT'), (50, 'ATAT'), (77, 'ACGT'), (84, 'TCGA'), (94, 'TATA'), (95, 'ATAT')]

在 FASTA 文件中查找长度为 18 的回文序列？

Finding palindromics sequences of length 18 in FASTA file?

python

substring