相当于 pandas 中的 ave

equivalent of ave in pandas

我的 post 与另一个 SO post: 类似,但我遇到了错误。

假设:

我有一个数据框df:

     A      B  C    D
0  foo    one -2.0  0.5
1  bar    one -1.5 -1.5
2  foo    two -0.5 -0.8
3  bar  three -0.0  0.7
4  foo    two -1.5  0.9
5  bar    two  1.5  0.6
6  foo    one -0.0 -0.4
7  foo  three  0.5  1.8

我想创建另一个列 E,其中包含 c' each group when grouped by sayA` 列 mean 中的值,

     A      B  C    D    E
0  foo    one -2.0  0.5  -0.7
1  bar    one -1.5 -1.5   0.0
2  foo    two -0.5 -0.8  -0.7
3  bar  three -0.0  0.7   0.0
4  foo    two -1.5  0.9  -0.7
5  bar    two  1.5  0.6   0.0
6  foo    one -0.0 -0.4  -0.7
7  foo  three  0.5  1.8  -0.7

我试过这个例子,SO post 比如

df['E'] = df.groupby('A').transform(lambda x: pandas.Series(x.C.mean()))

df['E'] = df.groupby('A').transform(lambda x: pandas.Series(x['C'].mean()))

但我得到了 ValueError: Wrong number of items passed 3, placement implies 1

这是完整的错误消息集:

Traceback (most recent call last):
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2978, in set
    loc = self.items.get_loc(item)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\index.py", line 1402, in get_loc
    return self._engine.get_loc(_values_from_object(key))
  File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3807)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3687)
  File "pandas\hashtable.pyx", line 696, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12310)
  File "pandas\hashtable.pyx", line 704, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12261)
KeyError: 'E'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-85-36e1c884837f>", line 1, in <module>
    df['E']=df.groupby('A').transform(lambda x: pandas.Series(x.C.max()))
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py", line 2110, in __setitem__
    self._set_item(key, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\frame.py", line 2188, in _set_item
    NDFrame._set_item(self, key, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\generic.py", line 1179, in _set_item
    self._data.set(key, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2981, in set
    self.insert(len(self.items), item, value)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 3080, in insert
    placement=slice(loc, loc+1))
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 2099, in make_block
    placement=placement)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 1427, in __init__
placement=placement)
  File "C:\PF\WinPython-64bit-3.4.2.4\python-3.4.2.amd64\lib\site-packages\pandas\core\internals.py", line 76, in __init__
    len(self.values), len(self.mgr_locs)))
ValueError: Wrong number of items passed 3, placement implies 1

我可能做错了什么?

我正在使用 Python 3.4.2.4 和 Pandas 版本 0.15.2

我认为transform是正确的做法,但需要直接抓取列:

>>> df["E"] = df.groupby("A")["C"].transform("mean")
>>> df
     A      B    C    D    E
0  foo    one -2.0  0.5 -0.7
1  bar    one -1.5 -1.5  0.0
2  foo    two -0.5 -0.8 -0.7
3  bar  three -0.0  0.7  0.0
4  foo    two -1.5  0.9 -0.7
5  bar    two  1.5  0.6  0.0
6  foo    one -0.0 -0.4 -0.7
7  foo  three  0.5  1.8 -0.7

这与通常获取分组列的方式基本相同:

>>> df.groupby("A")["C"].mean()
A
bar    0.0
foo   -0.7
Name: C, dtype: float64

transform 将结果广播回各组。