处理 BeautifulSoup CSS 选择器中的冒号
Dealing with a colon in BeautifulSoup CSS selectors
输入HTML:
<div style="display: flex">
<div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
<div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
<div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>
所需输出: 所有 div
元素正好在 <div style="display: flex">
.
下
我正在尝试使用 CSS selector:
找到父 div
div[style="display: flex"]
这会引发错误:
>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
看起来 BeautifulSoup
试图将冒号解释为伪 class 语法。
我已尝试遵循 Handling a colon in an element ID in a CSS selector 中建议的建议,但它仍然会引发错误:
>>> soup.select('div[style="display\: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
>>> soup.select('div[style="displayA flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1426, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "div[style="displayA"
问题:
use/escape BeautifulSoup
CSS 选择器中属性值中冒号的正确方法是什么?
请注意,我可以使用部分属性匹配来解决它:
soup.select("div[style$=flex]")
或者,用 find_all()
:
soup.find_all("div", style="display: flex")
另请注意,我知道使用 style
定位元素远不是一个好的定位技术,但问题本身是通用的,提供的 HTML 只是一个例子.
不确定这是否构成 答案,因为它肯定已损坏。然而,奇怪的是,错误不是由 :
本身触发的,而是由 :
后跟 space 触发的。该错误表明它正在尝试使用 space 之后的任何内容作为 CSS 选择器。
例如,编辑 HTML 以删除 space 使块再次可选择:
>>> from bs4 import BeautifulSoup
>>> html = """
... <div style="display:flex">
... <div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
... <div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
... <div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
... </div>
... """
>>> soup = BeautifulSoup(html)
>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/bs4/element.py", line 1313, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "flex"]"
>>> soup.select('div[style="display:flex"]')
[<div style="display:flex">
<div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
<div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
<div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>]
不幸的是,space 是通常的风格,所以这可能不会让你走得太远!
更新: 该问题现已在 BeautifulSoup 4.5.0 中修复,如有需要请升级:
pip install --upgrade beautifulsoup4
旧答案:
在 BeautifulSoup
问题跟踪器上创建了一个问题:
如果启动板问题有任何更新,将更新答案。
输入HTML:
<div style="display: flex">
<div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
<div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
<div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>
所需输出: 所有 div
元素正好在 <div style="display: flex">
.
我正在尝试使用 CSS selector:
找到父div
div[style="display: flex"]
这会引发错误:
>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
看起来 BeautifulSoup
试图将冒号解释为伪 class 语法。
我已尝试遵循 Handling a colon in an element ID in a CSS selector 中建议的建议,但它仍然会引发错误:
>>> soup.select('div[style="display\: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
>>> soup.select('div[style="displayA flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1426, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "div[style="displayA"
问题:
use/escape BeautifulSoup
CSS 选择器中属性值中冒号的正确方法是什么?
请注意,我可以使用部分属性匹配来解决它:
soup.select("div[style$=flex]")
或者,用 find_all()
:
soup.find_all("div", style="display: flex")
另请注意,我知道使用 style
定位元素远不是一个好的定位技术,但问题本身是通用的,提供的 HTML 只是一个例子.
不确定这是否构成 答案,因为它肯定已损坏。然而,奇怪的是,错误不是由 :
本身触发的,而是由 :
后跟 space 触发的。该错误表明它正在尝试使用 space 之后的任何内容作为 CSS 选择器。
例如,编辑 HTML 以删除 space 使块再次可选择:
>>> from bs4 import BeautifulSoup
>>> html = """
... <div style="display:flex">
... <div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
... <div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
... <div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
... </div>
... """
>>> soup = BeautifulSoup(html)
>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/bs4/element.py", line 1313, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "flex"]"
>>> soup.select('div[style="display:flex"]')
[<div style="display:flex">
<div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
<div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
<div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>]
不幸的是,space 是通常的风格,所以这可能不会让你走得太远!
更新: 该问题现已在 BeautifulSoup 4.5.0 中修复,如有需要请升级:
pip install --upgrade beautifulsoup4
旧答案:
在 BeautifulSoup
问题跟踪器上创建了一个问题:
如果启动板问题有任何更新,将更新答案。