如何网络抓取 NBA 的首发阵容?
How to web scrape the starting lineup for the NBA?
我是网络抓取的新手,需要一些帮助。我想使用 Xpath 抓取 NBA 的首发阵容、球队和球员的位置。我只从名字开始,因为我 运行 遇到了问题。
到目前为止,这是我的代码:
from urllib.request import urlopen
from lxml.html import fromstring
url = "https://www.lineups.com/nba/lineups"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for nba, bball_row in enumerate(tree.xpath('//tr[contains(@class,"t-content")]')):
names = bball_row.xpath('.//span[@_ngcontent-c5="long-player-name"]/text()')[0]
print(names)
看起来程序运行没有错误,但名字没有打印出来。非常感谢有关如何更有效地使用 Xpath 进行解析的任何提示。我尝试搞乱 Xpath 助手和 Xpath Finder。也许那里有一些技巧可以使过程更容易。提前感谢您的时间和努力!
位于看起来像
的 script
节点内的必需内容
<script nonce="STATE_TRANSFER_TOKEN">window['TRANSFER_STATE'] = {...}</script>
您可以尝试执行以下操作以将数据提取为简单的 Python 字典:
import re
import json
import requests
source = requests.get("https://www.lineups.com/nba/lineups").text
dictionary = json.loads(re.search(r"window\['TRANSFER_STATE'\]\s=\s(\{.*\})<\/script>", source).group(1))
可选:粘贴 dictionary
here 的输出并单击 "Beautify" 以查看可读数据 JSON
然后您可以通过按键访问所需的值,例如
for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['home_players']:
print(player['name'])
Kyrie Irving
Jaylen Brown
Jayson Tatum
Gordon Hayward
Al Horford
for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['away_players']:
print(player['name'])
D.J. Augustin
Evan Fournier
Jonathan Isaac
Aaron Gordon
Nikola Vucevic
更新
我想我只是把它弄得太复杂了:)
应该像下面这样简单:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for player in source['data'][0]['away_players']:
print(player['name'])
更新 2
要获得所有球队阵容,请使用以下:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
for player in team['home_players']:
print(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])
我是网络抓取的新手,需要一些帮助。我想使用 Xpath 抓取 NBA 的首发阵容、球队和球员的位置。我只从名字开始,因为我 运行 遇到了问题。
到目前为止,这是我的代码:
from urllib.request import urlopen
from lxml.html import fromstring
url = "https://www.lineups.com/nba/lineups"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for nba, bball_row in enumerate(tree.xpath('//tr[contains(@class,"t-content")]')):
names = bball_row.xpath('.//span[@_ngcontent-c5="long-player-name"]/text()')[0]
print(names)
看起来程序运行没有错误,但名字没有打印出来。非常感谢有关如何更有效地使用 Xpath 进行解析的任何提示。我尝试搞乱 Xpath 助手和 Xpath Finder。也许那里有一些技巧可以使过程更容易。提前感谢您的时间和努力!
位于看起来像
的script
节点内的必需内容
<script nonce="STATE_TRANSFER_TOKEN">window['TRANSFER_STATE'] = {...}</script>
您可以尝试执行以下操作以将数据提取为简单的 Python 字典:
import re
import json
import requests
source = requests.get("https://www.lineups.com/nba/lineups").text
dictionary = json.loads(re.search(r"window\['TRANSFER_STATE'\]\s=\s(\{.*\})<\/script>", source).group(1))
可选:粘贴 dictionary
here 的输出并单击 "Beautify" 以查看可读数据 JSON
然后您可以通过按键访问所需的值,例如
for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['home_players']:
print(player['name'])
Kyrie Irving
Jaylen Brown
Jayson Tatum
Gordon Hayward
Al Horford
for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['away_players']:
print(player['name'])
D.J. Augustin
Evan Fournier
Jonathan Isaac
Aaron Gordon
Nikola Vucevic
更新
我想我只是把它弄得太复杂了:)
应该像下面这样简单:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for player in source['data'][0]['away_players']:
print(player['name'])
更新 2
要获得所有球队阵容,请使用以下:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
for player in team['home_players']:
print(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])