抓取篮球运动员队名的最佳方法是什么？

Question

我可以成功抓取多个列，但是，我无法抓取相应球员的球队名称。到目前为止，这是我的代码：

from urllib.request import urlopen
from lxml.html import fromstring

import pandas as pd


url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"

content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)


for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
    names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
    mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
    per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
    ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
    bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
    vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
    print(names, per, ws, bpm, vorp)

到目前为止一切正常。不过，我想添加团队名称的类别。我正在寻找缩写的球队名称（例如，俄克拉荷马城的 OKC）。

下面的代码运行变成了错误：

team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
    print(team)

代码开始打印所有团队名称，然后遇到错误。

这是错误：

team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range

只是重申一下我在寻找什么... 我要尝试在相应球员旁边添加缩写的球队名称。

如有任何建议，我们将不胜感激。我想提前感谢社区的时间和努力！

Answer 1

您的脚本仅在未找到其查找的值时才会抛出该错误。您可以做的是捕获错误并以正确的方式处理它。试试下面的脚本：

import requests
from lxml.html import fromstring

url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"

content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)

for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
    names = row.xpath('.//td[@data-stat="player"]/a')[0].text
    mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
    per = row.xpath('.//td[@data-stat="per"]/text()')[0]
    ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
    bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
    vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
    try:
        team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
    except IndexError: team = "N/A"
    print(names, per, ws, bpm, vorp, team)

您可能会得到这样的输出：

Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA

抓取篮球运动员队名的最佳方法是什么？

What is the best way to scrape the basketball player's team name?

python

xpath

parsing

lxml

web-scraping