获取 BeautifulSoup 中 id 为空的标签内容
Get content of tags with empty id in BeautifulSoup
from bs4 import BeautifulSoup
page = """<span id="something">useless</span>
<span id="">some text</span>
<span id="different">useless</span>"""
soup = BeautifulSoup(page)
如何才能只获得some text
?使用 soup.find_all('span', {'id': ""})
查找所有内容。
你有两个选择:
使用自定义过滤器;传入一个函数,它将被要求 return True
或 False
元素:
soup.find_all(lambda e: e.name == 'span' and e.attrs.get('id') == '')
使用 CSS selector,属性完全匹配:
soup.select('span[id=""]')
演示:
>>> from bs4 import BeautifulSoup
>>> page = """<span id="something">useless</span>
... <span id="">some text</span>
... <span id="different">useless</span>"""
>>> soup = BeautifulSoup(page)
>>> soup.find_all(lambda e: e.name == 'span' and e.attrs.get('id') == '')
[<span id="">some text</span>]
>>> soup.select('span[id=""]')
[<span id="">some text</span>]
from bs4 import BeautifulSoup
page = """<span id="something">useless</span>
<span id="">some text</span>
<span id="different">useless</span>"""
soup = BeautifulSoup(page)
如何才能只获得some text
?使用 soup.find_all('span', {'id': ""})
查找所有内容。
你有两个选择:
使用自定义过滤器;传入一个函数,它将被要求 return
True
或False
元素:soup.find_all(lambda e: e.name == 'span' and e.attrs.get('id') == '')
使用 CSS selector,属性完全匹配:
soup.select('span[id=""]')
演示:
>>> from bs4 import BeautifulSoup
>>> page = """<span id="something">useless</span>
... <span id="">some text</span>
... <span id="different">useless</span>"""
>>> soup = BeautifulSoup(page)
>>> soup.find_all(lambda e: e.name == 'span' and e.attrs.get('id') == '')
[<span id="">some text</span>]
>>> soup.select('span[id=""]')
[<span id="">some text</span>]