遍历 df 中的行并根据这些值创建一个新列
Iterating through rows in a df and creating a new column based on those values
我想创建一个新的相对分数(列)来比较给定年份和给定团队中的 F1 车手与其队友。
我的数据如下:
stats_df.head()
> driver year team points
> 0 AIT 2020 Williams 0.0
> 1 ALB 2019 Red Bull 76.0
> 2 ALB 2019 AlphaTauri 16.0
> 3 ALB 2020 Red Bull 105.0
> 4 ALO 2013 Ferrari 242.0
我累了:
teams = stats_df['team'].unique()
years = stats_df['year'].unique()
drivers = stats_df['driver'].unique()
for year in years:
for team in teams:
team_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].sum()
for driver in drivers:
driver_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver]
power_score = driver_points/(team_points/2)
stats_df['power_score'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver] = power_score
导致新列中出现 NaN ('power_score')。
不胜感激。
查看您的代码,您可以使用 .groupby(["team", "year"])
计算 team_points
,然后简单地将 points
除以这些值:
team_points = df.groupby(["team", "year"])["points"].transform("sum")
df["power_score"] = df["points"] / (team_points / 2)
print(df)
打印:
driver year team points power_score
0 AIT 2020 Williams 0.0 NaN
1 ALB 2019 Red Bull 76.0 2.0
2 ALB 2019 AlphaTauri 16.0 2.0
3 ALB 2020 Red Bull 105.0 2.0
4 ALO 2013 Ferrari 242.0 2.0
我想创建一个新的相对分数(列)来比较给定年份和给定团队中的 F1 车手与其队友。
我的数据如下:
stats_df.head()
> driver year team points
> 0 AIT 2020 Williams 0.0
> 1 ALB 2019 Red Bull 76.0
> 2 ALB 2019 AlphaTauri 16.0
> 3 ALB 2020 Red Bull 105.0
> 4 ALO 2013 Ferrari 242.0
我累了:
teams = stats_df['team'].unique()
years = stats_df['year'].unique()
drivers = stats_df['driver'].unique()
for year in years:
for team in teams:
team_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].sum()
for driver in drivers:
driver_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver]
power_score = driver_points/(team_points/2)
stats_df['power_score'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver] = power_score
导致新列中出现 NaN ('power_score')。
不胜感激。
查看您的代码,您可以使用 .groupby(["team", "year"])
计算 team_points
,然后简单地将 points
除以这些值:
team_points = df.groupby(["team", "year"])["points"].transform("sum")
df["power_score"] = df["points"] / (team_points / 2)
print(df)
打印:
driver year team points power_score
0 AIT 2020 Williams 0.0 NaN
1 ALB 2019 Red Bull 76.0 2.0
2 ALB 2019 AlphaTauri 16.0 2.0
3 ALB 2020 Red Bull 105.0 2.0
4 ALO 2013 Ferrari 242.0 2.0