将值从一个数据框列传递到 Pandas 中的另一个数据框

Question

我有几个数据框。我想从第一个数据框中的 2 列中获取数据，以标记第二个数据框中存在的行。第一个数据框 (df1) 如下所示

Sup4 Seats  Primary Seats   Back up Seats
 Pa   3       2              1
 Ka   2       1              1
 Ga   1       0              1
 Gee  1       1              0
 Re   2       2              0

(df2) 看起来像

Sup4    First   Last  Primary Seats     Backup Seats  Rating
Pa      Peter   He          NaN         NaN           2.3
Ka      Sonia   Du          NaN         NaN           2.99
Ga      Agnes   Bla         NaN         NaN           3.24
Gee    Jeffery  Rus         NaN         NaN           3.5
Gee    John     Cro         NaN         NaN           1.3
Pa     Pavol    Rac         NaN         NaN           1.99
Pa     Ciara    Lee         NaN         NaN           1.88
Re     David    Wool        NaN         NaN           2.34
Re     Stefan   Rot         NaN         NaN           2
Re     Franc    Bor         NaN         NaN           1.34
Ka     Tania    Le          NaN         NaN           2.35

我需要的每个 Sup4 名称的输出也将通过将评级从最高到最低排序进行分组，然后根据 df1 列主要席位和备用席位标记席位列。

我对样本的第一个 Sup4 名称 Pa 进行了分组和排序，我必须对所有名称进行分组和排序

Sup4    First   Last      Primary Seats   Backup Seats  Rating
Pa      Peter   He                  M                     2.3
Pa      Pavol   Rac                 M                     1.99
Pa      Ciara   Lee                           M           1.88
Ka      Sonia   Du                  M                     2.99
Ka      Tania   Le                            M           2.35
Ga      Agnes   Bla                           M           3.24
:
:
:

继续这样

分组排序我都试过了

sorted_df = df2.sort_values(['Sup4','Rating'],ascending=[True,False])

但是我需要帮助来传递 df1 列值以在第二个数据框中进行标记

Answer 1

解决方案 #1：

您可以执行 merge，但您需要包含一些逻辑来更新您的 Seats 列。此外，重要的是要提到您需要决定如何处理长度不等的数据。 ~GeeandRe` 在两个数据帧中的长度不相等。解决方案 #2 中的更多信息：

df3 = (pd.merge(df2[['Sup4', 'First', 'Last', 'Rating']], df1, on='Sup4')
         .sort_values(['Sup4', 'Rating'], ascending=[True, False]))
s = df3.groupby('Sup4', sort=False).cumcount() + 1
df3['Backup Seats'] = np.where(s - df3['Primary Seats'] > 0, 'M', '')
df3['Primary Seats'] = np.where(s <= df3['Primary Seats'], 'M', '')
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]: 
   Sup4    First  Last Primary Seats Backup Seats  Rating
5    Ga    Agnes   Bla                          M    3.24
6   Gee  Jeffery   Rus             M                  3.5
7   Gee     John   Cro                          M     1.3
3    Ka    Sonia    Du             M                 2.99
4    Ka    Tania    Le                          M    2.35
0    Pa    Peter    He             M                  2.3
1    Pa    Pavol   Rac             M                 1.99
2    Pa    Ciara   Lee                          M    1.88
8    Re    David  Wool             M                 2.34
9    Re   Stefan   Rot             M                  2.0
10   Re    Franc   Bor                          M    1.34

解决方案 #2：

完成此解决方案后，我意识到解决方案 #1 会简单得多，但我认为我的解决方案也包括在内。此外，这还可以让您深入了解如何处理两个数据框中大小不等的值。您可以 reindex 第一个数据帧并使用 combine_first() 但您必须做一些准备。同样，您需要决定如何处理长度不等的数据。在我的回答中，我只是简单地排除了 Sup4 长度不等的组，以保证在最终调用 combine_first():

时索引对齐

# Purpose of `mtch` is to check if rows in second dataframe are equal to the count of seats in first.
# If not, then I have excluded the `Sup4` with unequal lengths in both dataframes
mtch = df1.groupby('Sup4')['Seats'].first().eq(df2.groupby('Sup4').size())
df1 = df1.sort_values('Sup4', ascending=True)[df1['Sup4'].isin(mtch[mtch].index)]
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)

#`reindex` the dataframe, get the cumulative count, and manipulate data with `np.where`
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
s = df1.groupby('Sup4').cumcount() + 1
df1['Backup Seats'] = np.where(s - df1['Primary Seats'] > 0, 'M', '')
df1['Primary Seats'] = np.where(s <= df1['Primary Seats'], 'M', '')

#like df1, in df2 we exclude groups with uneven lengths and sort
df2 = (df2[df2['Sup4'].isin(mtch[mtch].index)]
       .sort_values(['Sup4', 'Rating'], ascending=[True, False]).reset_index(drop=True))

#can use `combine_first` since we have ensured that the data is sorted and of equal lengths in both dataframes
df3 = df2.combine_first(df1)

#order columns and only include required columns
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]: 
  Sup4  First Last Primary Seats Backup Seats  Rating
0   Ga  Agnes  Bla                          M    3.24
1   Ka  Sonia   Du             M                 2.99
2   Ka  Tania   Le                          M    2.35
3   Pa  Peter   He             M                  2.3
4   Pa  Pavol  Rac             M                 1.99
5   Pa  Ciara  Lee                          M    1.88

将值从一个数据框列传递到 Pandas 中的另一个数据框

Passing values from one data frame columns to another data frame in Pandas

sorting

grouping

pandas