select BeautifulSoup 中的方法无法 select 属性值为白色 space
select method in BeautifulSoup not able to select attribute value with white space
city = soup.select('a[href="/city/london d12"]')
以上代码出现错误信息:
ValueError: Unsupported or invalid CSS selector: "a[href=/city/london"
我想知道是否有解决方法或替代美汤的方法?
<a title="London" href="/city/london d12">london</a>
您必须将属性值括在双引号中:
a[href="/city/london d12"]
不过,这个特定的选择器似乎被 BeautifulSoup
识别为 "invalid"。这是因为 BeautifulSoup
supports only basic CSS selectors:
This is all a convenience for users who know the CSS selector syntax.
You can do all this stuff with the Beautiful Soup API. And if CSS
selectors are all you need, you might as well use lxml directly: it’s
a lot faster, and it supports more CSS selectors. But this lets you
combine simple CSS selectors with the Beautiful Soup API.
让我们按照建议直接使用lxml
+cssselect
:
>>> from lxml.cssselect import CSSSelector
>>> from lxml.etree import fromstring
>>>
>>> sel = CSSSelector('a[href="/city/london d12"]')
>>>
>>> tree = fromstring('<a title="London" href="/city/london d12">london</a>')
>>> sel(tree)
[<Element a at 0x100dad878>]
也可以使用部分属性匹配:
soup.select('a[href*=london]') # contains "london"
soup.select('a[href$=d12]') # ends with "d12"
soup.select('a[href^=/city/london]') # starts with "city/london"
city = soup.select('a[href="/city/london d12"]')
以上代码出现错误信息:
ValueError: Unsupported or invalid CSS selector: "a[href=/city/london"
我想知道是否有解决方法或替代美汤的方法?
<a title="London" href="/city/london d12">london</a>
您必须将属性值括在双引号中:
a[href="/city/london d12"]
不过,这个特定的选择器似乎被 BeautifulSoup
识别为 "invalid"。这是因为 BeautifulSoup
supports only basic CSS selectors:
This is all a convenience for users who know the CSS selector syntax. You can do all this stuff with the Beautiful Soup API. And if CSS selectors are all you need, you might as well use lxml directly: it’s a lot faster, and it supports more CSS selectors. But this lets you combine simple CSS selectors with the Beautiful Soup API.
让我们按照建议直接使用lxml
+cssselect
:
>>> from lxml.cssselect import CSSSelector
>>> from lxml.etree import fromstring
>>>
>>> sel = CSSSelector('a[href="/city/london d12"]')
>>>
>>> tree = fromstring('<a title="London" href="/city/london d12">london</a>')
>>> sel(tree)
[<Element a at 0x100dad878>]
也可以使用部分属性匹配:
soup.select('a[href*=london]') # contains "london"
soup.select('a[href$=d12]') # ends with "d12"
soup.select('a[href^=/city/london]') # starts with "city/london"