使用样式属性解析 html 中的 div 元素

Question

我正在尝试使用 Python 和 BeautifulSoup.[=15 从 html 文件中获取 div 元素中的文本 Something here I want to get =]

这是 html 中的部分代码：

<div xmlns="" id="idp46819314579224" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #d43f3a; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;" class="" onclick="toggleSection('idp46819314579224-container');" onmouseover="this.style.cursor='pointer'">Something here I want to get<div id="idp46819314579224-toggletext" style="float: right; text-align: center; width: 8px;">
                -
            </div>
</div>

这就是我尝试做的事情：

vu = soup.find_all("div", {"style" : "background: #d43f3a"})

for div in vu:
    print(div.text)

我使用循环是因为有几个 div 具有不同的 id，但它们都具有相同的背景颜色。它没有错误，但我没有输出。

如何获取以背景色为条件的文字？

Answer 1

style属性里面还有其他内容

style="box-sizing: ....; ....;"

您当前的代码在询问 if style == "background: #d43f3a" 而不是。

你可以做的是询问 if "background: #d43f3a" in style -- sub-string 检查。

一种方法是传递 regular expression。

>>> import re
>>> vu = soup.find_all("div", style=re.compile("background: #d43f3a"))
... 
... for div in vu:
...     print(div.text.strip())
Something here I want to get

你也可以用CSS Selectors

说同样的话

soup.select('div[style*="background: #d43f3a"]')

或者通过传递 function/lambda

>>> vu = soup.find_all("div", style=lambda style: "background: #d43f3a" in style)
... 
... for div in vu:
...     print(div.text.strip())
Something here I want to get

使用样式属性解析 html 中的 div 元素

Parse div element from html with style attributes

python

parsing

beautifulsoup