抓取篮球运动员队名的最佳方法是什么?
What is the best way to scrape the basketball player's team name?
我可以成功抓取多个列,但是,我无法抓取相应球员的球队名称。到目前为止,这是我的代码:
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
print(names, per, ws, bpm, vorp)
到目前为止一切正常。不过,我想添加团队名称的类别。我正在寻找缩写的球队名称(例如,俄克拉荷马城的 OKC)。
下面的代码运行变成了错误:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
print(team)
代码开始打印所有团队名称,然后遇到错误。
这是错误:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range
只是重申一下我在寻找什么...
我要尝试在相应球员旁边添加缩写的球队名称。
如有任何建议,我们将不胜感激。我想提前感谢社区的时间和努力!
您的脚本仅在未找到其查找的值时才会抛出该错误。您可以做的是捕获错误并以正确的方式处理它。试试下面的脚本:
import requests
from lxml.html import fromstring
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
names = row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
try:
team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
except IndexError: team = "N/A"
print(names, per, ws, bpm, vorp, team)
您可能会得到这样的输出:
Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA
我可以成功抓取多个列,但是,我无法抓取相应球员的球队名称。到目前为止,这是我的代码:
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
print(names, per, ws, bpm, vorp)
到目前为止一切正常。不过,我想添加团队名称的类别。我正在寻找缩写的球队名称(例如,俄克拉荷马城的 OKC)。
下面的代码运行变成了错误:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
print(team)
代码开始打印所有团队名称,然后遇到错误。
这是错误:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range
只是重申一下我在寻找什么... 我要尝试在相应球员旁边添加缩写的球队名称。
如有任何建议,我们将不胜感激。我想提前感谢社区的时间和努力!
您的脚本仅在未找到其查找的值时才会抛出该错误。您可以做的是捕获错误并以正确的方式处理它。试试下面的脚本:
import requests
from lxml.html import fromstring
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
names = row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
try:
team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
except IndexError: team = "N/A"
print(names, per, ws, bpm, vorp, team)
您可能会得到这样的输出:
Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA