Pandas group by then apply 抛出警告
Pandas group by then apply throwing a warning
我有代码行
df = df.groupby(by=['col_A','col_B'])['float_col_c']
df.loc[:,'amount_cumulative'] = df.apply(lambda x: x.cumsum())
引发警告:
/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
通常,当我看到该错误时,我可以将某些内容更改为 .loc[]
来修复它,但在这种情况下,警告似乎指的是另一个问题。我知道我可以抑制警告,但我更愿意理解我用 Pandas 语法造成的问题。非常感谢任何有关如何更正此语法的建议。
我相信这是因为 .loc[:, 'amount_cumulative']
索引,returns 是 df
的一部分,而不是对新列的引用
更新: df
正如@QuangHoang 正确指出的那样,它本身就是一个副本,在这种情况下,以下内容仍会引发错误。
你可以在没有警告的情况下得到预期的结果,就像这样简单:
df['amount_cumulative'] = df.groupby(['col_A','col_B'])['float_col_c'].cumsum()
很可能您的 df
已经是另一个数据框的副本。您的命名 df_rev_melt_trim
也表明了这一点。测试
old_df = pd.DataFrame({'A':np.random.randint(1,10,1000),
'B':np.random.randint(1,10,1000),
'C':np.random.uniform(0,1,1000)})
df = old_df[old_df['A'] > 5]
df['amount_cumulative'] = df.groupby(by=['A','B'])['C'].cumsum()
产生相同的警告。相反,您可以这样做:
old_df.loc[df.index,'amount_cumulative'] = df.groupby(by=['A','B'])['C'].cumsum()
并且没有显示警告。
我有代码行
df = df.groupby(by=['col_A','col_B'])['float_col_c']
df.loc[:,'amount_cumulative'] = df.apply(lambda x: x.cumsum())
引发警告:
/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
通常,当我看到该错误时,我可以将某些内容更改为 .loc[]
来修复它,但在这种情况下,警告似乎指的是另一个问题。我知道我可以抑制警告,但我更愿意理解我用 Pandas 语法造成的问题。非常感谢任何有关如何更正此语法的建议。
我相信这是因为 .loc[:, 'amount_cumulative']
索引,returns 是 df
的一部分,而不是对新列的引用
更新: df
正如@QuangHoang 正确指出的那样,它本身就是一个副本,在这种情况下,以下内容仍会引发错误。
你可以在没有警告的情况下得到预期的结果,就像这样简单:
df['amount_cumulative'] = df.groupby(['col_A','col_B'])['float_col_c'].cumsum()
很可能您的 df
已经是另一个数据框的副本。您的命名 df_rev_melt_trim
也表明了这一点。测试
old_df = pd.DataFrame({'A':np.random.randint(1,10,1000),
'B':np.random.randint(1,10,1000),
'C':np.random.uniform(0,1,1000)})
df = old_df[old_df['A'] > 5]
df['amount_cumulative'] = df.groupby(by=['A','B'])['C'].cumsum()
产生相同的警告。相反,您可以这样做:
old_df.loc[df.index,'amount_cumulative'] = df.groupby(by=['A','B'])['C'].cumsum()
并且没有显示警告。