Python BeautifulSoup 仅提取包含特定文本的 Class 文本

Python BeautifulSoup extract Class Text only if it contains specific text

如果整个 class text = New

,有没有办法提取下面的 class
 <li class="ClassifiedDetail">New

尝试过:

doc.find('li', class_ = 'ClassifiedDetail').attrs['New']

也许像 if class text = New or contains 'New',接受它?

注意 不清楚你指的是class还是标签,所以我假设你指的是标签的文本

一种方法可以使用 css selectors:-soup-contains():

soup.select('li.ClassifiedDetail:-soup-contains("New")')

Alternativ 使用 string=re.compile(), cause stringor in former versionstext` 仅适用于完整字符串的精确匹配:

soup.find_all('li', class_ = 'ClassifiedDetail',text=re.compile('New'))

例子

from bs4 import BeautifulSoup

html='''
<li class="ClassifiedDetail">New</li>
<li class="ClassifiedDetail">New York</li>
<li class="ClassifiedDetail">Ne </li>
<li class="ClassifiedDetail">Old</li>
<li class="ClassifiedDetail">knew</li>
'''

soup = BeautifulSoup(html)
for li in soup.select('li.ClassifiedDetail:-soup-contains("New")'):
    print(li.text)

输出

New
New York