如何使用 Python 的 beautifulsoup select 特定元素下的 table 元素

How to select a table element under a specific element using Python's beautifulsoup

我想要 select 个 table 下的元素 <i>Member</>

html代码:


<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
 <a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
 <i>Member</i>: 
 <a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
 <i>Formerly</i>: 
 <a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>

如何 select 会员 的文字仅作为示例?

我试过了:

li = bs.find('i', text = "Member")
children = li.findNextSiblings()
for child in children:
    member.append(child.text)
print(member)

但是它把所有结果作为输出:

SHDB Team
The Spider Society
New Warriors
The Six
Member
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders
Formerly
Future Foundation
Heroes For Hire
Fantastic Four

我只想选择会员部分。 这段代码让我选择 Member 之后和 formerly 之前的所有内容,但这是一个低效的解决方案:

     teams[teams.index("Member")+1:teams.index("Formerly")]

所有 i tags 彼此 following-siblings 通过 td tag 后面的文本值区分,所以很简单,您可以使用 css select 或切片到 select Member section.

from bs4 import BeautifulSoup
html = """
<table class="table profile-table">
 <td>
  Teams
 </td>
 <td>
  <i>
   Leader
  </i>
  :
  <a class="chip team" href="/shdb-team/20-739/">
   SHDB Team
  </a>
  <a class="chip team" href="/the-spider-society/20-490/">
   The Spider Society
  </a>
  <a class="chip team" href="/new-warriors/20-79/">
   New Warriors
  </a>
  <a class="chip team" href="/the-six/20-474/">
   The Six
  </a>
  <i>
   Member
  </i>
  :
  <a class="chip team" href="/the-mighty-avengers/20-384/">
   The Mighty Avengers
  </a>
  <a class="chip team" href="/new-avengers/20-101/">
   New Avengers
  </a>
  <a class="chip team" href="/shield/20-467/">
   S.H.I.E.L.D.
  </a>
  <a class="chip team" href="/avengers-resistance/20-154/">
   Avengers Resistance
  </a>
  <a class="chip team" href="/marvel-knights/20-377/">
   Marvel Knights
  </a>
  <a class="chip team" href="/avengers/20-4/">
   Avengers
  </a>
  <a class="chip team" href="/secret-defenders/20-96/">
   Secret Defenders
  </a>
  <a class="chip team" href="/daily-bugle/20-216/">
   Daily Bugle
  </a>
  <a class="chip team" href="/defenders/20-9/">
   Defenders
  </a>
  <i>
   Formerly
  </i>
  :
  <a class="chip team" href="/future-foundation/20-290/">
   Future Foundation
  </a>
  <a class="chip team" href="/heroes-for-hire/20-5/">
   Heroes For Hire
  </a>
  <a class="chip team" href="/fantastic-four/20-1/">
   Fantastic Four
  </a>
 </td>
</table>

"""

soup = BeautifulSoup(html, "html.parser")
for i in soup.select_one('.table.profile-table > td > i:nth-of-type(2)').next_siblings:
    if i.name == 'i':
        break
    if i.name == 'a':
        print(i.get_text(strip=True))

输出:

The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders  

您可以 select next_siblings 元素并检查同级标签名称是否为 a 或者如果标签名称为 i 则中断循环:

for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
    if tag.name == 'i':
        break
    if tag.name == 'a':
        print(tag.text) 
例子
html = '''
<table class="table profile-table">
<td>Teams</td>
<td>
<i>Leader</i>:
 <a href="/shdb-team/20-739/" class="chip team">SHDB Team</a><a href="/the-spider-society/20-490/" class="chip team">The Spider Society</a><a href="/new-warriors/20-79/" class="chip team">New Warriors</a><a href="/the-six/20-474/" class="chip team">The Six</a>
 <i>Member</i>: 
 <a href="/the-mighty-avengers/20-384/" class="chip team">The Mighty Avengers</a><a href="/new-avengers/20-101/" class="chip team">New Avengers</a><a href="/shield/20-467/" class="chip team">S.H.I.E.L.D.</a><a href="/avengers-resistance/20-154/" class="chip team">Avengers Resistance</a><a href="/marvel-knights/20-377/" class="chip team">Marvel Knights</a><a href="/avengers/20-4/" class="chip team">Avengers</a><a href="/secret-defenders/20-96/" class="chip team">Secret Defenders</a><a href="/daily-bugle/20-216/" class="chip team">Daily Bugle</a><a href="/defenders/20-9/" class="chip team">Defenders</a>
 <i>Formerly</i>: 
 <a href="/future-foundation/20-290/" class="chip team">Future Foundation</a><a href="/heroes-for-hire/20-5/" class="chip team">Heroes For Hire</a><a href="/fantastic-four/20-1/" class="chip team">Fantastic Four</a> </td>

'''
soup = BeautifulSoup(html)

for tag in soup.select_one('i:-soup-contains("Member")').next_siblings:
    if tag.name == 'i':
        break
    if tag.name == 'a':
        print(tag.text)
输出
The Mighty Avengers
New Avengers
S.H.I.E.L.D.
Avengers Resistance
Marvel Knights
Avengers
Secret Defenders
Daily Bugle
Defenders