beautifulsoup：find_all 在 bs4.element.ResultSet 对象或列表上？

Question

嗨，所以我在 beautifulsoup object 上应用了 find_all，然后找到了 bs4.element.ResultSet object 或 list。

我想在其中进一步执行 find_all，但在 bs4.element.ResultSet object 上不允许这样做。我可以遍历 bs4.element.ResultSet object 的每个元素来执行 find_all。但是我可以避免循环并将其转换回 beautifulsoup object 吗？

详情请见代码。谢谢

html_1 = """
<table>
    <thead>
        <tr class="myClass">
            <th>A</th>
            <th>B</th>
            <th>C</th>
            <th>D</th>
        </tr>
    </thead>
</table>
"""
soup = BeautifulSoup(html_1, 'html.parser')

type(soup) #bs4.BeautifulSoup

# do find_all on beautifulsoup object
th_all = soup.find_all('th')

# the result is of type bs4.element.ResultSet or similarly list
type(th_all) #bs4.element.ResultSet
type(th_all[0:1]) #list

# now I want to further do find_all
th_all.find_all(text='A') #not work

# can I avoid this need of loop?
for th in th_all:
    th.find_all(text='A') #works

Answer 1

ResultSet class 是列表的 subclass 而不是具有 find* 方法定义。循环遍历 find_all() 的结果是最常见的方法：

th_all = soup.find_all('th') result = [] for th in th_all: result.extend(th.find_all(text='A'))

通常，CSS selectors 可以帮助您一次性解决它，但并非您可以使用 find_all() 做的所有事情都可以使用 select() 方法。例如，在 bs4 CSS 选择器中没有可用的 "text" 搜索。但是，例如，如果您必须在 th 个元素内找到所有 b 个元素，您可以这样做：

soup.select("th td")

beautifulsoup：find_all 在 bs4.element.ResultSet 对象或列表上？

beautifulsoup: find_all on bs4.element.ResultSet object or list?

html

python

beautifulsoup

html-parsing