将值从一个数据框列传递到 Pandas 中的另一个数据框
Passing values from one data frame columns to another data frame in Pandas
我有几个数据框。我想从第一个数据框中的 2 列中获取数据,以标记第二个数据框中存在的行。
第一个数据框 (df1) 如下所示
Sup4 Seats Primary Seats Back up Seats
Pa 3 2 1
Ka 2 1 1
Ga 1 0 1
Gee 1 1 0
Re 2 2 0
(df2) 看起来像
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He NaN NaN 2.3
Ka Sonia Du NaN NaN 2.99
Ga Agnes Bla NaN NaN 3.24
Gee Jeffery Rus NaN NaN 3.5
Gee John Cro NaN NaN 1.3
Pa Pavol Rac NaN NaN 1.99
Pa Ciara Lee NaN NaN 1.88
Re David Wool NaN NaN 2.34
Re Stefan Rot NaN NaN 2
Re Franc Bor NaN NaN 1.34
Ka Tania Le NaN NaN 2.35
我需要的每个 Sup4 名称的输出也将通过将评级从最高到最低排序进行分组,然后根据 df1 列主要席位和备用席位标记席位列。
我对样本的第一个 Sup4 名称 Pa 进行了分组和排序,我必须对所有名称进行分组和排序
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He M 2.3
Pa Pavol Rac M 1.99
Pa Ciara Lee M 1.88
Ka Sonia Du M 2.99
Ka Tania Le M 2.35
Ga Agnes Bla M 3.24
:
:
:
继续这样
分组排序我都试过了
sorted_df = df2.sort_values(['Sup4','Rating'],ascending=[True,False])
但是我需要帮助来传递 df1 列值以在第二个数据框中进行标记
解决方案 #1:
您可以执行 merge
,但您需要包含一些逻辑来更新您的 Seats
列。此外,重要的是要提到您需要决定如何处理长度不等的数据。 ~Geeand
Re` 在两个数据帧中的长度不相等。解决方案 #2 中的更多信息:
df3 = (pd.merge(df2[['Sup4', 'First', 'Last', 'Rating']], df1, on='Sup4')
.sort_values(['Sup4', 'Rating'], ascending=[True, False]))
s = df3.groupby('Sup4', sort=False).cumcount() + 1
df3['Backup Seats'] = np.where(s - df3['Primary Seats'] > 0, 'M', '')
df3['Primary Seats'] = np.where(s <= df3['Primary Seats'], 'M', '')
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
5 Ga Agnes Bla M 3.24
6 Gee Jeffery Rus M 3.5
7 Gee John Cro M 1.3
3 Ka Sonia Du M 2.99
4 Ka Tania Le M 2.35
0 Pa Peter He M 2.3
1 Pa Pavol Rac M 1.99
2 Pa Ciara Lee M 1.88
8 Re David Wool M 2.34
9 Re Stefan Rot M 2.0
10 Re Franc Bor M 1.34
解决方案 #2:
完成此解决方案后,我意识到解决方案 #1 会简单得多,但我认为我的解决方案也包括在内。此外,这还可以让您深入了解如何处理两个数据框中大小不等的值。您可以 reindex
第一个数据帧并使用 combine_first()
但您必须做一些准备。同样,您需要决定如何处理长度不等的数据。在我的回答中,我只是简单地排除了 Sup4
长度不等的组,以保证在最终调用 combine_first()
:
时索引对齐
# Purpose of `mtch` is to check if rows in second dataframe are equal to the count of seats in first.
# If not, then I have excluded the `Sup4` with unequal lengths in both dataframes
mtch = df1.groupby('Sup4')['Seats'].first().eq(df2.groupby('Sup4').size())
df1 = df1.sort_values('Sup4', ascending=True)[df1['Sup4'].isin(mtch[mtch].index)]
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
#`reindex` the dataframe, get the cumulative count, and manipulate data with `np.where`
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
s = df1.groupby('Sup4').cumcount() + 1
df1['Backup Seats'] = np.where(s - df1['Primary Seats'] > 0, 'M', '')
df1['Primary Seats'] = np.where(s <= df1['Primary Seats'], 'M', '')
#like df1, in df2 we exclude groups with uneven lengths and sort
df2 = (df2[df2['Sup4'].isin(mtch[mtch].index)]
.sort_values(['Sup4', 'Rating'], ascending=[True, False]).reset_index(drop=True))
#can use `combine_first` since we have ensured that the data is sorted and of equal lengths in both dataframes
df3 = df2.combine_first(df1)
#order columns and only include required columns
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
0 Ga Agnes Bla M 3.24
1 Ka Sonia Du M 2.99
2 Ka Tania Le M 2.35
3 Pa Peter He M 2.3
4 Pa Pavol Rac M 1.99
5 Pa Ciara Lee M 1.88
我有几个数据框。我想从第一个数据框中的 2 列中获取数据,以标记第二个数据框中存在的行。 第一个数据框 (df1) 如下所示
Sup4 Seats Primary Seats Back up Seats
Pa 3 2 1
Ka 2 1 1
Ga 1 0 1
Gee 1 1 0
Re 2 2 0
(df2) 看起来像
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He NaN NaN 2.3
Ka Sonia Du NaN NaN 2.99
Ga Agnes Bla NaN NaN 3.24
Gee Jeffery Rus NaN NaN 3.5
Gee John Cro NaN NaN 1.3
Pa Pavol Rac NaN NaN 1.99
Pa Ciara Lee NaN NaN 1.88
Re David Wool NaN NaN 2.34
Re Stefan Rot NaN NaN 2
Re Franc Bor NaN NaN 1.34
Ka Tania Le NaN NaN 2.35
我需要的每个 Sup4 名称的输出也将通过将评级从最高到最低排序进行分组,然后根据 df1 列主要席位和备用席位标记席位列。
我对样本的第一个 Sup4 名称 Pa 进行了分组和排序,我必须对所有名称进行分组和排序
Sup4 First Last Primary Seats Backup Seats Rating
Pa Peter He M 2.3
Pa Pavol Rac M 1.99
Pa Ciara Lee M 1.88
Ka Sonia Du M 2.99
Ka Tania Le M 2.35
Ga Agnes Bla M 3.24
:
:
:
继续这样
分组排序我都试过了
sorted_df = df2.sort_values(['Sup4','Rating'],ascending=[True,False])
但是我需要帮助来传递 df1 列值以在第二个数据框中进行标记
解决方案 #1:
您可以执行 merge
,但您需要包含一些逻辑来更新您的 Seats
列。此外,重要的是要提到您需要决定如何处理长度不等的数据。 ~Geeand
Re` 在两个数据帧中的长度不相等。解决方案 #2 中的更多信息:
df3 = (pd.merge(df2[['Sup4', 'First', 'Last', 'Rating']], df1, on='Sup4')
.sort_values(['Sup4', 'Rating'], ascending=[True, False]))
s = df3.groupby('Sup4', sort=False).cumcount() + 1
df3['Backup Seats'] = np.where(s - df3['Primary Seats'] > 0, 'M', '')
df3['Primary Seats'] = np.where(s <= df3['Primary Seats'], 'M', '')
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
5 Ga Agnes Bla M 3.24
6 Gee Jeffery Rus M 3.5
7 Gee John Cro M 1.3
3 Ka Sonia Du M 2.99
4 Ka Tania Le M 2.35
0 Pa Peter He M 2.3
1 Pa Pavol Rac M 1.99
2 Pa Ciara Lee M 1.88
8 Re David Wool M 2.34
9 Re Stefan Rot M 2.0
10 Re Franc Bor M 1.34
解决方案 #2:
完成此解决方案后,我意识到解决方案 #1 会简单得多,但我认为我的解决方案也包括在内。此外,这还可以让您深入了解如何处理两个数据框中大小不等的值。您可以 reindex
第一个数据帧并使用 combine_first()
但您必须做一些准备。同样,您需要决定如何处理长度不等的数据。在我的回答中,我只是简单地排除了 Sup4
长度不等的组,以保证在最终调用 combine_first()
:
# Purpose of `mtch` is to check if rows in second dataframe are equal to the count of seats in first.
# If not, then I have excluded the `Sup4` with unequal lengths in both dataframes
mtch = df1.groupby('Sup4')['Seats'].first().eq(df2.groupby('Sup4').size())
df1 = df1.sort_values('Sup4', ascending=True)[df1['Sup4'].isin(mtch[mtch].index)]
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
#`reindex` the dataframe, get the cumulative count, and manipulate data with `np.where`
df1 = df1.reindex(df1.index.repeat(df1['Seats'])).reset_index(drop=True)
s = df1.groupby('Sup4').cumcount() + 1
df1['Backup Seats'] = np.where(s - df1['Primary Seats'] > 0, 'M', '')
df1['Primary Seats'] = np.where(s <= df1['Primary Seats'], 'M', '')
#like df1, in df2 we exclude groups with uneven lengths and sort
df2 = (df2[df2['Sup4'].isin(mtch[mtch].index)]
.sort_values(['Sup4', 'Rating'], ascending=[True, False]).reset_index(drop=True))
#can use `combine_first` since we have ensured that the data is sorted and of equal lengths in both dataframes
df3 = df2.combine_first(df1)
#order columns and only include required columns
df3 = df3[['Sup4', 'First', 'Last', 'Primary Seats', 'Backup Seats', 'Rating']]
df3
Out[1]:
Sup4 First Last Primary Seats Backup Seats Rating
0 Ga Agnes Bla M 3.24
1 Ka Sonia Du M 2.99
2 Ka Tania Le M 2.35
3 Pa Peter He M 2.3
4 Pa Pavol Rac M 1.99
5 Pa Ciara Lee M 1.88