在 BeautifulSoup 中,如何搜索包含文本但也有某个 class 祖先的元素?
In BeautifulSoup, how do I search for an element that contains text but also has an ancestor with a certain class?
我正在使用 BeautifulSoup 4 和 Python 3.7。我想找到一个元素,它的元素中有文本“points”,但也有一个祖先 DIV,其 class 属性包含 "article"。我已经弄清楚如何搜索带有文本的元素 ...
points_elt = soup.find_all(text=re.compile(' points'))[0]
但我不知道如何扩展上面的内容以包含带有该文本的元素,这些元素还包含带有 class "article." 的祖先这是我想要的元素示例找到..
<div class="article class2">
... other elements ...
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
... other element closing tags ...
</div>
这是它应该处理的另一个示例...
<div class="article class7">
<p>
<div class="abc">
<span class="outerSpan">
<span>8000 points</span>
</span>
</div>
</p>
</div>
from bs4 import BeautifulSoup
import re
data = """
<div class="article class2">
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
</div>
"""
soup = BeautifulSoup(data, 'html.parser')
for item in soup.findAll(text=re.compile('points$')):
print(item)
输出:
2000 points
from bs4 import BeautifulSoup
data = """
<div class="article class2">
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
</div>
"""
soup = BeautifulSoup(data, 'html.parser')
for item in soup.findAll('span', {'class': 'innerSpan'}):
print(item.text)
输出:
2000 points
您可以使用 css 选择器并检查您要查找的字符串。
html='''<div class="article class2">
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
</div>
'''
soup=BeautifulSoup(html,'html.parser')
for item in soup.select('.article .innerSpan'):
if 'points' in item.text:
print(item.text)
或者你可以使用这个。
soup=BeautifulSoup(html,'html.parser')
for item in soup.select('.article:contains(points)'):
print(item.text.strip())
span = soup.find_all('span')
if 'points' in span[1].text:
div = span[1].parent.parent
print(div)
span 变量包含所有 span 元素,我们正在遍历回 HTML
标签的父级。考虑到这始终是 HTML
.
的格式
我正在使用 BeautifulSoup 4 和 Python 3.7。我想找到一个元素,它的元素中有文本“points”,但也有一个祖先 DIV,其 class 属性包含 "article"。我已经弄清楚如何搜索带有文本的元素 ...
points_elt = soup.find_all(text=re.compile(' points'))[0]
但我不知道如何扩展上面的内容以包含带有该文本的元素,这些元素还包含带有 class "article." 的祖先这是我想要的元素示例找到..
<div class="article class2">
... other elements ...
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
... other element closing tags ...
</div>
这是它应该处理的另一个示例...
<div class="article class7">
<p>
<div class="abc">
<span class="outerSpan">
<span>8000 points</span>
</span>
</div>
</p>
</div>
from bs4 import BeautifulSoup
import re
data = """
<div class="article class2">
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
</div>
"""
soup = BeautifulSoup(data, 'html.parser')
for item in soup.findAll(text=re.compile('points$')):
print(item)
输出:
2000 points
from bs4 import BeautifulSoup
data = """
<div class="article class2">
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
</div>
"""
soup = BeautifulSoup(data, 'html.parser')
for item in soup.findAll('span', {'class': 'innerSpan'}):
print(item.text)
输出:
2000 points
您可以使用 css 选择器并检查您要查找的字符串。
html='''<div class="article class2">
<span class="outerSpan">
<span class="innerSpan">2000 points</span>
</span>
</div>
'''
soup=BeautifulSoup(html,'html.parser')
for item in soup.select('.article .innerSpan'):
if 'points' in item.text:
print(item.text)
或者你可以使用这个。
soup=BeautifulSoup(html,'html.parser')
for item in soup.select('.article:contains(points)'):
print(item.text.strip())
span = soup.find_all('span')
if 'points' in span[1].text:
div = span[1].parent.parent
print(div)
span 变量包含所有 span 元素,我们正在遍历回 HTML
标签的父级。考虑到这始终是 HTML
.