如何多次获取Python中两个标记之间的子串?

How to get the substring between two markers in Python multiple times?

我有以下代码:

s = '''alt="Thunder Force"/>ehkjehkljhiflealt="Godzilla vs. Kong"/>'''

for i in s:
    start = s.find('alt="') + len('alt="')
    end = s.find('"/>')
    substring = s[start:end]
    print(substring)

但它只打印了很多次“Thunder Force”。我希望它能找到“Thunder Force”和“Godzilla vs. Kong”并分别打印这两个。怎么样?

使用regexre.findall()

s = '''alt="Thunder Force"/>ehkjehkljhiflealt="Godzilla vs. Kong"/>'''

print(re.findall(r'(?<=alt\=").*?(?="/>)', s))
#['Thunder Force', 'Godzilla vs. Kong']

你可以使用正则表达式

import re
s = '''alt="Thunder Force"/>ehkjehkljhiflealt="Godzilla vs. Kong"/>'''
x = re.findall(r'alt="(.*?)"/>', s)
print(x)

输出

['Thunder Force', 'Godzilla vs. Kong']

这是一个非正则表达式的解决方案,看起来更像我认为您通过发布的尝试试图实现的目标:

start = 0
while True:
    start = s.find('alt="', start)
    if start == -1:
        break
    start += len('alt="')
    end = s.find('"/>', start)
    if end == -1:
        break
    substring = s[start:end]
    start = end
    print(substring)

您还可以使用 negated character class [^"]+ 来匹配除 " 之外的任何字符,如果您想至少匹配一个字符,则重复 1 次以上。

如果空匹配也可以,可以用*代替+

import re
s = '''alt="Thunder Force"/>ehkjehkljhiflealt="Godzilla vs. Kong"/>'''
x = re.findall(r'alt="([^"]+)"/>', s)
print(x)

输出

['Thunder Force', 'Godzilla vs. Kong']