如何修改 Dataframe 以便每行存储其重复行的所有数据？

Question

我的数据框包含三列（ID、key 和 word）

   ID  key   word
0   1    A  Apple
1   1    B  Bug
2   2    C  Cat
3   3    D  Dog
4   3    E  Exogenous
5   3    E  Egg

我想根据需要创建额外的 key 和 word 列，以便在存在重复的行时将数据存储在 key 和 word 列中IDs

这是输出的一个片段

   ID  key_0  key_1   word_0   word_1  
0   1      A      B    Apple      Bug

注意：在上面的输出中，ID#1在dataframe中出现了两次，所以"key"值[=与重复项 ID 关联的 25=] 将存储在新列 "key_1" 中。在重复 ID#1 中找到的单词 Bug 也将存储在新列 word_1 中。

完整的输出应该如下所示：

    ID  key_0  key_1   key_2   word_0        word_1    word_2
0   1       A      B     NaN    Apple           Bug       NaN
1   2       C    NaN     NaN      Cat           NaN       NaN
2   3       D      E       E      Dog     Exogenous       Egg

注意在完整的输出中，ID#3重复了三遍。第二个重复 "E" 的 key 将存储在 "key_1" 列下，第三个重复 "E" 将存储在新列 "key_2" 中。这适用于相同 mannar 中的单词 "Exogenous" 和 "Egg"。

我发现解决方案很有用，但它只适用于 key 列：

df.groupby('ID')['key'].apply(
lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])).unstack(-1)

知道如何使 lambda 函数同时适用于 key 和 word 列吗？

谢谢，

Answer 1

你可以在使用Alex的解决方案后使用concat :

df1 = df.groupby('ID')['key'].apply(
lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])).unstack(-1)

df2 = df.groupby('ID')['word'].apply(
lambda s: pd.Series(s.values, index=['word_%s' % i for i in range(s.shape[0])])).unstack(-1)

df3 = pd.DataFrame({'ID':df['ID'].unique()})

df_new = pd.concat([df1,df2,df3],axis=1)

Answer 2

df2 = df.set_index('ID').groupby(level=0).apply(lambda df: df.reset_index(drop=True)).unstack()
df2.columns = df2.columns.set_levels((df2.columns.levels[1]).astype(str), level=1)
df2.columns = df2.columns.to_series().str.join('_')
df2

Answer 3

pivot_table的另一个解决方案：

df['cols'] = df.groupby('ID')['ID'].cumcount().astype(str)
df1 = df.pivot_table(index='ID', columns='cols', values=['key','word'], aggfunc=''.join)
df1.columns = ['_'.join(col) for col in df1.columns]
print (df1)
   key_0 key_1 key_2 word_0     word_1 word_2
ID                                           
1      A     B  None  Apple        Bug   None
2      C  None  None    Cat       None   None
3      D     E     E    Dog  Exogenous    Egg

如何修改 Dataframe 以便每行存储其重复行的所有数据？

How to Modify the Dataframe so Each Row Stores All the Data of its Duplicate Rows?

python

format

series

dataframe

pandas