将列表的元素附加到 multi-dimensional 列表中
Appending elements of a list into a multi-dimensional list
您好,我正在 this page 的 python 中使用 NBA 数据进行网络抓取。 basketball-reference 的一些元素很容易抓取,但是由于我缺乏 python 知识,这个元素给我带来了一些麻烦。
我能够获取我想要的数据和列 headers,但我最终得到了 2 个数据列表,我需要按它们的索引(我认为?)进行组合,以便索引 0 的player_injury_info 与 player_names 等的索引 0 对齐,我不知道该怎么做。
下面我粘贴了一些代码,您可以按照这些代码进行操作。
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime, timezone, timedelta
url = "https://www.basketball-reference.com/friv/injuries.fcgi"
html = urlopen(url)
soup = BeautifulSoup(html)
# this correctly gives me the 4 column headers i want (Player, Team, Update, Description)
headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
# 2 lists - player_injury_info and player_names. they need to be combined.
rows = soup.findAll('tr')
player_injury_info = [[td.getText() for td in rows[i].findAll('td')]
for i in range(len(rows))]
player_injury_info = player_injury_info[1:] # removing first element bc dont need it
player_names = [[th.getText() for th in rows[i].findAll('th')]
for i in range(len(rows))]
player_names = player_names[1:] # removing first element bc dont need it
### joining the lists in the correct order- the part i dont know how to do
player_list = player_names.append(player_injury_info)
### this should give me the data frame i want if i can get player_injury_info into the right format.
injury_data = pd.DataFrame(player_injury_info, columns = headers)
可能有更简单的方法将数据通过网络抓取到所有 1 个列表/数据框中?或者像我尝试做的那样将 2 个列表连接在一起就可以了。但是,如果有人能够跟进并提供解决方案,我将不胜感激!
我想你想要这个(元组列表),使用 zip:
players = ["joe", "bill"]
injuries = ["tooth-ache", "mental break"]
list(zip(players, injuries))
结果:
[('joe', 'tooth-ache'), ('bill', 'mental break')]
让 pandas 为您解析 table。
import pandas as pd
url = "https://www.basketball-reference.com/friv/injuries.fcgi"
injury_data = pd.read_html(url)[0]
输出:
print(injury_data)
Player ... Description
0 Onyeka Okongwu ... Out (Shoulder) - The Hawks announced that Okon...
1 Jaylen Brown ... Out (Wrist) - The Celtics announced that Brown...
2 Coby White ... Out (Shoulder) - The Bulls announced that Whit...
3 Taurean Prince ... Out (Ankle) - The Cavaliers announced F Taurea...
4 Jamal Murray ... Out (Knee) - Murray is recovering from a torn ...
5 Klay Thompson ... Out (Right Achilles) - Thompson is on track to...
6 James Wiseman ... Out (Knee) - Wiseman is on track to be ready b...
7 T.J. Warren ... Out (Foot) - Warren underwent foot surgery and...
8 Serge Ibaka ... Out (Back) - The Clippers announced Serge Ibak...
9 Kawhi Leonard ... Out (Knee) - The Clippers announced Kawhi Leon...
10 Victor Oladipo ... Out (Knee) - Oladipo could be cleared for full...
11 Donte DiVincenzo ... Out (Foot) - DiVincenzo suffered a tendon inju...
12 Jarrett Culver ... Out (Ankle) - The Timberwolves announced Culve...
13 Markelle Fultz ... Out (Knee) - Fultz will miss the rest of the s...
14 Jonathan Isaac ... Out (Knee) - Isaac is making progress with his...
15 Dario Šarić ... Out (Knee) - The Suns announced that Sario has...
16 Zach Collins ... Out (Ankle) - The Blazers announced that Colli...
17 Pascal Siakam ... Out (Shoulder) - The Raptors announced Pascal ...
18 Deni Avdija ... Out (Leg) - The Wizards announced that Avdija ...
19 Thomas Bryant ... Out (Left knee) - The Wizards announced that B...
[20 rows x 4 columns]
但是如果你要自己迭代它,我会简单地获取行(<tr>
标签),然后在 <a>
标签中获取玩家名称,并将它与那个结合起来行的 <td>
标签。然后从这些列表中创建您的数据框:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime, timezone, timedelta
url = "https://www.basketball-reference.com/friv/injuries.fcgi"
html = urlopen(url)
soup = BeautifulSoup(html)
headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
trs = soup.findAll('tr')[1:]
rows = []
for tr in trs:
player_name = tr.find('a').text
data = [player_name] + [x.text for x in tr.find_all('td')]
rows.append(data)
injury_data = pd.DataFrame(rows, columns = headers)
您好,我正在 this page 的 python 中使用 NBA 数据进行网络抓取。 basketball-reference 的一些元素很容易抓取,但是由于我缺乏 python 知识,这个元素给我带来了一些麻烦。
我能够获取我想要的数据和列 headers,但我最终得到了 2 个数据列表,我需要按它们的索引(我认为?)进行组合,以便索引 0 的player_injury_info 与 player_names 等的索引 0 对齐,我不知道该怎么做。
下面我粘贴了一些代码,您可以按照这些代码进行操作。
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime, timezone, timedelta
url = "https://www.basketball-reference.com/friv/injuries.fcgi"
html = urlopen(url)
soup = BeautifulSoup(html)
# this correctly gives me the 4 column headers i want (Player, Team, Update, Description)
headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
# 2 lists - player_injury_info and player_names. they need to be combined.
rows = soup.findAll('tr')
player_injury_info = [[td.getText() for td in rows[i].findAll('td')]
for i in range(len(rows))]
player_injury_info = player_injury_info[1:] # removing first element bc dont need it
player_names = [[th.getText() for th in rows[i].findAll('th')]
for i in range(len(rows))]
player_names = player_names[1:] # removing first element bc dont need it
### joining the lists in the correct order- the part i dont know how to do
player_list = player_names.append(player_injury_info)
### this should give me the data frame i want if i can get player_injury_info into the right format.
injury_data = pd.DataFrame(player_injury_info, columns = headers)
可能有更简单的方法将数据通过网络抓取到所有 1 个列表/数据框中?或者像我尝试做的那样将 2 个列表连接在一起就可以了。但是,如果有人能够跟进并提供解决方案,我将不胜感激!
我想你想要这个(元组列表),使用 zip:
players = ["joe", "bill"]
injuries = ["tooth-ache", "mental break"]
list(zip(players, injuries))
结果:
[('joe', 'tooth-ache'), ('bill', 'mental break')]
让 pandas 为您解析 table。
import pandas as pd
url = "https://www.basketball-reference.com/friv/injuries.fcgi"
injury_data = pd.read_html(url)[0]
输出:
print(injury_data)
Player ... Description
0 Onyeka Okongwu ... Out (Shoulder) - The Hawks announced that Okon...
1 Jaylen Brown ... Out (Wrist) - The Celtics announced that Brown...
2 Coby White ... Out (Shoulder) - The Bulls announced that Whit...
3 Taurean Prince ... Out (Ankle) - The Cavaliers announced F Taurea...
4 Jamal Murray ... Out (Knee) - Murray is recovering from a torn ...
5 Klay Thompson ... Out (Right Achilles) - Thompson is on track to...
6 James Wiseman ... Out (Knee) - Wiseman is on track to be ready b...
7 T.J. Warren ... Out (Foot) - Warren underwent foot surgery and...
8 Serge Ibaka ... Out (Back) - The Clippers announced Serge Ibak...
9 Kawhi Leonard ... Out (Knee) - The Clippers announced Kawhi Leon...
10 Victor Oladipo ... Out (Knee) - Oladipo could be cleared for full...
11 Donte DiVincenzo ... Out (Foot) - DiVincenzo suffered a tendon inju...
12 Jarrett Culver ... Out (Ankle) - The Timberwolves announced Culve...
13 Markelle Fultz ... Out (Knee) - Fultz will miss the rest of the s...
14 Jonathan Isaac ... Out (Knee) - Isaac is making progress with his...
15 Dario Šarić ... Out (Knee) - The Suns announced that Sario has...
16 Zach Collins ... Out (Ankle) - The Blazers announced that Colli...
17 Pascal Siakam ... Out (Shoulder) - The Raptors announced Pascal ...
18 Deni Avdija ... Out (Leg) - The Wizards announced that Avdija ...
19 Thomas Bryant ... Out (Left knee) - The Wizards announced that B...
[20 rows x 4 columns]
但是如果你要自己迭代它,我会简单地获取行(<tr>
标签),然后在 <a>
标签中获取玩家名称,并将它与那个结合起来行的 <td>
标签。然后从这些列表中创建您的数据框:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime, timezone, timedelta
url = "https://www.basketball-reference.com/friv/injuries.fcgi"
html = urlopen(url)
soup = BeautifulSoup(html)
headers = [th.getText() for th in soup.findAll('tr', limit=2)[0].findAll('th')]
trs = soup.findAll('tr')[1:]
rows = []
for tr in trs:
player_name = tr.find('a').text
data = [player_name] + [x.text for x in tr.find_all('td')]
rows.append(data)
injury_data = pd.DataFrame(rows, columns = headers)