遍历 df 中的行并根据这些值创建一个新列

Iterating through rows in a df and creating a new column based on those values

我想创建一个新的相对分数(列)来比较给定年份和给定团队中的 F1 车手与其队友。

我的数据如下:

stats_df.head()

>       driver  year    team    points
>     0 AIT 2020    Williams    0.0
>     1 ALB 2019    Red Bull    76.0
>     2 ALB 2019    AlphaTauri  16.0
>     3 ALB 2020    Red Bull    105.0
>     4 ALO 2013    Ferrari     242.0

我累了:

teams = stats_df['team'].unique()
years = stats_df['year'].unique()
drivers = stats_df['driver'].unique()

for year in years:
    for team in teams:
        team_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].sum()
        for driver in drivers:
            driver_points = stats_df['points'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver]
            power_score = driver_points/(team_points/2)
            stats_df['power_score'].loc[stats_df['team']==team].loc[stats_df['year']==year].loc[stats_df['driver']==driver] = power_score

导致新列中出现 NaN ('power_score')。

不胜感激。

查看您的代码,您可以使用 .groupby(["team", "year"]) 计算 team_points,然后简单地将 points 除以这些值:

team_points = df.groupby(["team", "year"])["points"].transform("sum")
df["power_score"] = df["points"] / (team_points / 2)
print(df)

打印:

  driver  year        team  points  power_score
0    AIT  2020    Williams     0.0          NaN
1    ALB  2019    Red Bull    76.0          2.0
2    ALB  2019  AlphaTauri    16.0          2.0
3    ALB  2020    Red Bull   105.0          2.0
4    ALO  2013     Ferrari   242.0          2.0