pandas 数据框中 groupby 对象的两列中的较大者

Question

我有一个像这样的数据框（最小可重现示例）：

 Search_Term  Exit_Pages      Ratio_x Date_x   Ratio_y Date_y
 hello        /store/catalog  .20     8/30/17  .25     7/30/17
 hello        /store/product  .15     8/30/17  .10     7/30/17
 goodbye      /store/search   .35     8/30/17  .20     7/30/17
 goodbye      /store/product  .25     8/30/17  .40     7/30/17

我想做的是首先按搜索词分组，然后为每个搜索词找到 Ratio_x 和 Ratio_y 中的较大者（同时将所有剩余列保留在数据框中).所以我想看到的输出是：

Search_Term   Exit_Pages  Ratio_x   Date_x   Ratio_y  Date_y  Highest_Ratio

 hello        /store/catalog  .20     8/30/17  .25     7/30/17  .25
 hello        /store/product  .15     8/30/17  .10     7/30/17
 goodbye      /store/search   .35     8/30/17  .20     7/30/17
 goodbye      /store/product  .25     8/30/17  .40     7/30/17  .40

我尝试做的是创建一个 groupby Search_Term 并按如下方式使用 apply 应用两列函数中的 Greater of the two columns 函数（之后我打算将此数据框连接到我的原始数据框以包含值上面，但是错误消息阻止我执行该步骤）：

def Greater(Merge, maximumA, maximumB):
    a = Merge[maximumA]
    b = Merge[maximumB]
    return max(a,b)

Merger.groupby("Search_Term").apply(Greater, "Ratio_x","Ratio_y")

This gives me the error message: ValueError: The truth value of a Series is 
ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我可以做一些小的修改来使我的代码工作吗？如果可以，那会是什么？如果不是，问题到底是什么，我该如何解决这个问题？

Answer 1

也许 groupby + transform 是您想要的？

df['Highest_Ratio'] = df.groupby('Search_Term')\
            ['Ratio_x', 'Ratio_y'].transform('max').max(1)

df['Highest_Ratio']

0    0.25
1    0.25
2    0.40
3    0.40
Name: Highest_Ratio, dtype: float64

您可以使用 np.where 再执行一步以获得准确的输出：

m = df['Highest_Ratio'].eq(df['Ratio_x']) | df['Highest_Ratio'].eq(df['Ratio_y'])
df['Highest_Ratio'] = np.where(m, df['Highest_Ratio'], '')

df

  Search_Term      Exit_Pages  Ratio_x   Date_x  Ratio_y   Date_y  \
0       hello  /store/catalog     0.20  8/30/17     0.25  7/30/17   
1       hello  /store/product     0.15  8/30/17     0.10  7/30/17   
2     goodbye   /store/search     0.35  8/30/17     0.20  7/30/17   
3     goodbye  /store/product     0.25  8/30/17     0.40  7/30/17   

  Highest_Ratio  
0          0.25  
1                
2                
3           0.4

请记住，最好跳过这一步，因为就性能而言，混合字符串和浮点数并不是最好的主意。

pandas 数据框中 groupby 对象的两列中的较大者

Greater of two columns of a groupby object in a pandas dataframe

python

dataframe

pandas

pandas-groupby