BeautifulSoup:难以正确访问 table

BeautifulSoup: Difficulty accessing correct table

我正在使用 BeautifulSoup4 抓取页面,但以下函数给我带来了 2 个问题:

def getTeamRoster(teamURL):
    html = urllib.request.urlopen(teamURL).read()
    soup = BeautifulSoup(html)
    teamPlayers = []
    #second table
    corebody = soup.find(id = "corebody")
    teamTable = corebody.table.next_sibling.next_sibling.next_sibling.next_sibling
    print(teamTable)
    tableBody = teamTable.find('tbody')
    print(tableBody)
    tableRows = tableBody.findAll('tr')

1) 当我只调用“.next_sibling” 4 次(如上所述)时,我似乎 得到了正确的table。但是,我尝试访问的 table 标签是 #corebody ID 中的第 6 个 table。当我调用“.next_sibling” 5 次时,我从 BeautifulSoup 得到 -1,表示我请求的 table 不存在?我以为您通常会在发生这种情况时返回 None。知道为什么调用“.next_sibling” 5 次没有按预期工作吗?

URL 是 http://modules.ussquash.com/ssm/pages/leagues/Team_Information.asp?id=11325

2) table正文 = teamTable.find('tbody') 给我带来了一些麻烦。当我打印 tableBody 时,我得到 None 但我不确定为什么会这样(我正在访问的 table 中肯定有一个标签)。

想法?

感谢您的帮助, 克莱曼

我可以使用 pandas.read_html:

获得球员 table
import requests
import pandas as pd

url = 'http://modules.ussquash.com/ssm/pages/leagues/Team_Information.asp?id=11325'
tables = pd.read_html(requests.get(url).content)
tables[4]
                            \n\t\t\t\tPlayers\n\t\t\t           City Gender  SinglesRating TeamPosition  Expiration Win/Loss    P Registered Code Ref. Exam
0                                         Browne,Noah        Taunton      M           5.56            1  02/29/2016   14 / 4    -   08/28/14    -       NaN
1                                      Ellis,Thornton            rye      M           4.27           10  02/29/2016    0 / 9    -   08/28/14    -      pass
2                                          Line,James    Glastonbury      M           4.25           10  02/29/2016    2 / 7    -   08/28/14    -       NaN
3                                   Desantis,Scott J.        Sudbury      M           5.08            2  02/29/2016   9 / 10    -   08/28/14    -      pass
4                                    Bahadori,Cameron    Great Falls      M           4.97            3  01/12/2016   3 / 10    -   11/05/14    -      pass
5                                       Groot,Michael       Victoria      M           4.76            4  02/29/2016   5 / 11    -   08/28/14    -       NaN
6                                       Ehsani,Darian      Greenwich      M           4.76            5  02/29/2016   6 / 13    -   08/28/14    -      pass
7                                          Kardon,Max         Weston      M           4.83            6  02/29/2016   5 / 14    -   08/28/14    -      pass
8                                          Van,Jeremy            NaN      M           4.66            7  02/29/2016   5 / 13    -   08/28/14    -       NaN
9                              Southmayd,Alexander T.         Boston      M           4.91            8  02/29/2016   13 / 6    -   08/28/14    -      pass
10                                 Cacouris,Stephen A         Alpine      M           4.68            9  02/29/2016   9 / 10    -   08/28/14    -      pass
11                                  Groot,Christopher       Edmonton      M           4.62            -  02/29/2016    0 / 2    -   08/28/14    -       NaN
12                                Mack,Peter D. (sub)     N. Eastham      M           3.94            -  02/29/2016    0 / 1    -   11/23/14    -       NaN
13                               Shrager,Nathaniel O.       Stanford      M           0.00            -  02/29/2016    0 / 0    -   08/28/14    -       NaN
14                                Woolverton,Peter C.  Chestnut Hill      M           4.06            -  02/29/2016    1 / 0    -   08/28/14    -       NaN
15  Total Players: 15 Average singles rating: 4.36...            NaN    NaN            NaN          NaN         NaN      NaN  NaN        NaN  NaN       NaN

使用soup.select

一个班轮:

[i.get_text() for i in soup.select('#corebody table tr td') if 'Won' in i.get_text() or 'Lost' in i.get_text()]`

长版:

for i in soup.select('#corebody table tr td'):
    if 'Won' in i.get_text() or 'Lost' in i.get_text():
        print i.get_text()`

[u'Won 7-2', u'Won 5-4', u'Lost 1-8', u'Lost 1-8', u'Won 8-1', u'Lost 3-6', u'Won 7-2', u'Lost 0-9', u'Lost 1-8', u'Won 5-4', u'Lost 1-8', u'Lost 2-7', u'Won 8-1', u'Lost 3-6', u'Lost 4-5', u'Lost 4-5', u'Lost 1-8', u'Lost 4-5', u'Won 6-3']