Python Pandas 中 R 函数 'ave' 的等价物

Question

我在 R 中有一个数据框。示例：

d1<-structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), B = 1:9), .Names     = c("A", 
"B"), class = "data.frame", row.names = c(NA, -9L))

我想要这样的输出

d2<-structure(list(A = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), B = 1:9, 
c = c(3L, 3L, 3L, 7L, 7L, 7L, 7L, 9L, 9L)), .Names = c("A", 
"B", "c"), class = "data.frame", row.names = c(NA, -9L))

我可以使用 ave 函数在 R 中完成。

d1$c<-ave(d1$B,d1$A,FUN=max)

现在我想在 python 完成。我的意思是，如果我有像 d1 这样的数据，我如何在 python pandas 中创建 d2？

Answer 1

R ave 函数 (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ave.html) 将函数（默认为平均）应用于具有相同因子水平的观测值组合。

在pandas中没有这样的开箱即用的功能，但是你可以通过groupby操作来做到这一点。

从您的数据框开始：

In [86]: df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2, 2, 3, 3], 'B':range(1,10)})

In [87]: df
Out[87]: 
   A  B
0  1  1
1  1  2
2  1  3
3  2  4
4  2  5
5  2  6
6  2  7
7  3  8
8  3  9

您可以添加 C 列作为 A 分组的结果，并计算每组 B 的最大值：

In [88]: df['C'] = df.groupby('A')['B'].transform('max')

In [89]: df
Out[89]: 
   A  B  C
0  1  1  3
1  1  2  3
2  1  3  3
3  2  4  7
4  2  5  7
5  2  6  7
6  2  7  7
7  3  8  9
8  3  9  9

注意：我在这里使用 transform 方法，因为我想以与原始数据帧相同的索引结束。

有关 pandas 中 groupby 功能的更多信息，请参阅 http://pandas.pydata.org/pandas-docs/stable/groupby.html

Python Pandas 中 R 函数 'ave' 的等价物

Equivalent of R function 'ave' in Python Pandas

python

r

pandas