如何从多索引中创建我的数据框单一索引?
How do I make my dataframe single index from multindex?
我想让我的数据框更具美感,并删除我认为 multi-index 中不必要的第一行和第一列。我希望 headers 列为:'Rk'、'Team'、'Conf'、'G'、'Rec'、'ADJOE'。 ....,'WAB'
非常感谢任何帮助。
import pandas as pd
url = 'https://www.barttorvik.com/#'
df = pd.read_html(url)
df = df[0]
df
您只需迭代现有列和 select 第二个值。然后您可以将值列表设置为新列:
import pandas as pd
url = 'https://www.barttorvik.com/#'
df = pd.read_html(url)
df.columns = [x[1] for x in df.columns]
df.head()
输出:
Rk Team Conf G Rec AdjOE AdjDE Barthag EFG% EFGD% ... ORB DRB FTR FTRD 2P% 2P%D 3P% 3P%D Adj T. WAB
0 1 Gonzaga WCC 24 22-211–0 122.42 89.05 .97491 60.21 421 ... 30.2120 2318 30.4165 21.710 62.21 41.23 37.821 29.111 73.72 4.611
1 2 Houston Amer 25 21-410–2 117.39 89.06 .95982 53.835 42.93 ... 37.26 27.6141 28.2242 33.3247 54.827 424 34.8108 29.418 65.2303 3.416
当您从 HTML 读取时,将您想要的行号指定为 header:
df = pd.read_html(url, header=1)[0]
print(df.head())
输出:
>>
Rk Team Conf G Rec ... 2P%D 3P% 3P%D Adj T. WAB
0 1 Gonzaga WCC 24 22-211–0 ... 41.23 37.821 29.111 73.72 4.611
1 2 Houston Amer 25 21-410–2 ... 424 34.8108 29.418 65.2303 3.416
2 3 Kentucky SEC 26 21-510–3 ... 46.342 35.478 29.519 68.997 4.89
3 4 Arizona P12 25 23-213–1 ... 39.91 33.7172 31.471 72.99 6.24
4 5 Baylor B12 26 21-59–4 ... 49.2165 35.966 30.440 68.3130 6.15
我想让我的数据框更具美感,并删除我认为 multi-index 中不必要的第一行和第一列。我希望 headers 列为:'Rk'、'Team'、'Conf'、'G'、'Rec'、'ADJOE'。 ....,'WAB'
非常感谢任何帮助。
import pandas as pd
url = 'https://www.barttorvik.com/#'
df = pd.read_html(url)
df = df[0]
df
您只需迭代现有列和 select 第二个值。然后您可以将值列表设置为新列:
import pandas as pd
url = 'https://www.barttorvik.com/#'
df = pd.read_html(url)
df.columns = [x[1] for x in df.columns]
df.head()
输出:
Rk Team Conf G Rec AdjOE AdjDE Barthag EFG% EFGD% ... ORB DRB FTR FTRD 2P% 2P%D 3P% 3P%D Adj T. WAB
0 1 Gonzaga WCC 24 22-211–0 122.42 89.05 .97491 60.21 421 ... 30.2120 2318 30.4165 21.710 62.21 41.23 37.821 29.111 73.72 4.611
1 2 Houston Amer 25 21-410–2 117.39 89.06 .95982 53.835 42.93 ... 37.26 27.6141 28.2242 33.3247 54.827 424 34.8108 29.418 65.2303 3.416
当您从 HTML 读取时,将您想要的行号指定为 header:
df = pd.read_html(url, header=1)[0]
print(df.head())
输出:
>>
Rk Team Conf G Rec ... 2P%D 3P% 3P%D Adj T. WAB
0 1 Gonzaga WCC 24 22-211–0 ... 41.23 37.821 29.111 73.72 4.611
1 2 Houston Amer 25 21-410–2 ... 424 34.8108 29.418 65.2303 3.416
2 3 Kentucky SEC 26 21-510–3 ... 46.342 35.478 29.519 68.997 4.89
3 4 Arizona P12 25 23-213–1 ... 39.91 33.7172 31.471 72.99 6.24
4 5 Baylor B12 26 21-59–4 ... 49.2165 35.966 30.440 68.3130 6.15