如何匹配包含 BeautifulSoup 列表中的字符串的元素?

How to match elements containing string from BeautifulSoup list?

下面有input.html我

Input.html https://jsfiddle.net/f86q7ubm/

并且我正在尝试将列表 allList 中的所有元素与 size=5 匹配,但是当我 运行 以下代码时,匹配内部没有值。

from bs4 import BeautifulSoup

fp = open("file.html", "rb")                 
soup = BeautifulSoup(fp,"html5lib")

allList = soup.find_all(True)

matching = [s for s in allList if 'size="5"' in s]  

我做错了什么?

可能(应该)有更好的方法,但您可以这样做 str(s)。您试图在非字符串对象中进行匹配:

from bs4 import BeautifulSoup

fp = open("file.html", "rb")                 
soup = BeautifulSoup(fp,"html5lib")

allList = soup.find_all(True)

matching = [s for s in allList if 'size="5"' in str(s)] 

不确定这是否是您想要的,但更好的方法可能是:

allList = soup.find_all("font", {"size": "5"}) # you already have the matching elements here
soup = BeautifulSoup(html, 'html.parser')

for item in soup.findAll("font", {'size': 5}):
    print(item.text)

输出:

TEXT S 5 MORE TEXT
TEXT S 5 MORE TEXT
TEXT S 5 MORE TEXT