仅当有两个 类 并且共享相同的第一个时才选择元素
Selecting elements only if have two classes and share the same first one
我在HTML中有这些元素我想解析:
<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>
如何制作 CSS select 或 select 具有属性 class 行和 class 其他内容(可能是标题、文本或其他任何东西)但不只属性 class 行?
我试过:
td[class=line.*]
td.line.*
td[class^=line.]
编辑
我正在使用 Python 和 BeautifulSoup:
url = 'http://www.somewebsite'
res = requests.get(url)
res.raise_for_status()
DicoSoup = bs4.BeautifulSoup(res.text, "lxml")
elems = DicoSoup.select('body div#someid tr td.line')
我正在研究修改最后一块,即 td.line 到 td.line.whateverotherclass
之类的东西(但不是单独 td.line 否则我的 selector 就足够了)
您可以为 class 选择器链接 CSS classes。
.line {
color: green;
}
.line.text {
color: red;
}
.line.heading {
color: blue;
}
<p class="line">GARBAGE</p>
<p class="line text">I WANT THAT</p>
<p class="line heading">I WANT THAT</p>
<p class="line">GARBAGE</p>
什么@BoltClock suggested is generally a correct way to approach the problem with CSS selectors. The only problem is that BeautifulSoup
supports a limited number of CSS selectors. For instance, not()
selector is :not(.supported) at the moment.
您可以使用 "starts-with" 选择器来解决它,以检查 class 是否以 line
开头,然后是 space(它非常脆弱,但可以在您的样本数据):
for td in soup.select("td[class^='line ']"):
print(td.get_text(strip=True))
或者,您可以使用 find_all()
并使用 searching function 检查 class
属性以具有 line
和其他一些 class 来解决它:
from bs4 import BeautifulSoup
data = """
<table>
<tr>
<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>
</tr>
</table>"""
soup = BeautifulSoup(data, 'html.parser')
for td in soup.find_all(lambda tag: tag and tag.name == "td" and
"class" in tag.attrs and "line" in tag["class"] and
len(tag["class"]) > 1):
print(td.get_text(strip=True))
打印:
I WANT THAT
I WANT THAT
我在HTML中有这些元素我想解析:
<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>
如何制作 CSS select 或 select 具有属性 class 行和 class 其他内容(可能是标题、文本或其他任何东西)但不只属性 class 行?
我试过:
td[class=line.*]
td.line.*
td[class^=line.]
编辑
我正在使用 Python 和 BeautifulSoup:
url = 'http://www.somewebsite'
res = requests.get(url)
res.raise_for_status()
DicoSoup = bs4.BeautifulSoup(res.text, "lxml")
elems = DicoSoup.select('body div#someid tr td.line')
我正在研究修改最后一块,即 td.line 到 td.line.whateverotherclass
之类的东西(但不是单独 td.line 否则我的 selector 就足够了)
您可以为 class 选择器链接 CSS classes。
.line {
color: green;
}
.line.text {
color: red;
}
.line.heading {
color: blue;
}
<p class="line">GARBAGE</p>
<p class="line text">I WANT THAT</p>
<p class="line heading">I WANT THAT</p>
<p class="line">GARBAGE</p>
什么@BoltClock suggested is generally a correct way to approach the problem with CSS selectors. The only problem is that BeautifulSoup
supports a limited number of CSS selectors. For instance, not()
selector is :not(.supported) at the moment.
您可以使用 "starts-with" 选择器来解决它,以检查 class 是否以 line
开头,然后是 space(它非常脆弱,但可以在您的样本数据):
for td in soup.select("td[class^='line ']"):
print(td.get_text(strip=True))
或者,您可以使用 find_all()
并使用 searching function 检查 class
属性以具有 line
和其他一些 class 来解决它:
from bs4 import BeautifulSoup
data = """
<table>
<tr>
<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>
</tr>
</table>"""
soup = BeautifulSoup(data, 'html.parser')
for td in soup.find_all(lambda tag: tag and tag.name == "td" and
"class" in tag.attrs and "line" in tag["class"] and
len(tag["class"]) > 1):
print(td.get_text(strip=True))
打印:
I WANT THAT
I WANT THAT