如何分组并将操作聚合到多列?
How to groupby and aggregate an operation to multiple columns?
我正在尝试为基于两列的数据框中的行创建均值,但出现以下错误:
TypeError: 'numpy.float64' object is not callable
数据框:
date origin positive_score neutral_score negativity_score compound_score
2020-09-19 the verge 0.130 0.846 0.024 0.9833
2020-09-19 the verge 0.130 0.846 0.024 0.9833
2020-09-19 fool 0.075 0.869 0.056 0.8560
2020-09-19 seeking_alpha 0.067 0.918 0.015 0.9983
2020-09-19 seeking_alpha 0.171 0.791 0.038 0.7506
2020-09-19 seeking_alpha 0.095 0.814 0.091 0.9187
2020-09-19 seeking_alpha 0.113 0.801 0.086 0.9890
2020-09-19 seeking_alpha 0.094 0.869 0.038 0.9997
2020-09-19 wall street journal 0.000 1.000 0.000 0.0000
2020-09-19 seeking_alpha 0.179 0.779 0.042 0.9997
2020-09-19 seeking_alpha 0.178 0.704 0.117 0.7360
我的代码:
def mean_indicators(cls, df: pd.DataFrame):
df_with_mean = df.groupby([DATE, ORIGIN], as_index=False).agg({POSITIVE_SCORE: df[POSITIVE_SCORE].mean(),
NEGATIVE_SCORE: df[NEGATIVE_SCORE].mean(),
NEUTRAL_SCORE: df[NEUTRAL_SCORE].mean(),
COMPOUND_SCORE: df[COMPOUND_SCORE].mean()
})
return df_with_mean
我认为这应该可以满足您的要求:
def mean_indicators(cls, df: pd.DataFrame):
df_with_mean = df.groupby([DATE, ORIGIN], as_index=False).agg(
{POSITIVE_SCORE: "mean",
NEGATIVE_SCORE: "mean",
NEUTRAL_SCORE: "mean",
COMPOUND_SCORE: "mean",
})
return df_with_mean
您也可以使用命名聚合语法,如 here
- 该错误是错误聚合操作的结果。
{POSITIVE_SCORE: df[POSITIVE_SCORE].mean()
不正确。
{'positive_score': 'mean'}
正确
- 由于您要对所有 non-grouped 数值列求平均值,因此不需要该函数。
- 对整个数据帧使用
pandas.core.groupby.GroupBy.mean
一次操作。
- 使用
pandas.core.groupby.DataFrameGroupBy.aggregate
聚合不同的操作。
# just groupby and mean
df_mean = df.groupby(['date', 'origin'], as_index=False).mean()
# display(df_mean())
date origin positive_score neutral_score negativity_score compound_score
2020-09-19 fool 0.075000 0.869000 0.056 0.856000
2020-09-19 seeking_alpha 0.128143 0.810857 0.061 0.913143
2020-09-19 the verge 0.130000 0.846000 0.024 0.983300
2020-09-19 wall street journal 0.000000 1.000000 0.000 0.000000
我正在尝试为基于两列的数据框中的行创建均值,但出现以下错误:
TypeError: 'numpy.float64' object is not callable
数据框:
date origin positive_score neutral_score negativity_score compound_score
2020-09-19 the verge 0.130 0.846 0.024 0.9833
2020-09-19 the verge 0.130 0.846 0.024 0.9833
2020-09-19 fool 0.075 0.869 0.056 0.8560
2020-09-19 seeking_alpha 0.067 0.918 0.015 0.9983
2020-09-19 seeking_alpha 0.171 0.791 0.038 0.7506
2020-09-19 seeking_alpha 0.095 0.814 0.091 0.9187
2020-09-19 seeking_alpha 0.113 0.801 0.086 0.9890
2020-09-19 seeking_alpha 0.094 0.869 0.038 0.9997
2020-09-19 wall street journal 0.000 1.000 0.000 0.0000
2020-09-19 seeking_alpha 0.179 0.779 0.042 0.9997
2020-09-19 seeking_alpha 0.178 0.704 0.117 0.7360
我的代码:
def mean_indicators(cls, df: pd.DataFrame):
df_with_mean = df.groupby([DATE, ORIGIN], as_index=False).agg({POSITIVE_SCORE: df[POSITIVE_SCORE].mean(),
NEGATIVE_SCORE: df[NEGATIVE_SCORE].mean(),
NEUTRAL_SCORE: df[NEUTRAL_SCORE].mean(),
COMPOUND_SCORE: df[COMPOUND_SCORE].mean()
})
return df_with_mean
我认为这应该可以满足您的要求:
def mean_indicators(cls, df: pd.DataFrame):
df_with_mean = df.groupby([DATE, ORIGIN], as_index=False).agg(
{POSITIVE_SCORE: "mean",
NEGATIVE_SCORE: "mean",
NEUTRAL_SCORE: "mean",
COMPOUND_SCORE: "mean",
})
return df_with_mean
您也可以使用命名聚合语法,如 here
- 该错误是错误聚合操作的结果。
{POSITIVE_SCORE: df[POSITIVE_SCORE].mean()
不正确。{'positive_score': 'mean'}
正确
- 由于您要对所有 non-grouped 数值列求平均值,因此不需要该函数。
- 对整个数据帧使用
pandas.core.groupby.GroupBy.mean
一次操作。 - 使用
pandas.core.groupby.DataFrameGroupBy.aggregate
聚合不同的操作。
# just groupby and mean
df_mean = df.groupby(['date', 'origin'], as_index=False).mean()
# display(df_mean())
date origin positive_score neutral_score negativity_score compound_score
2020-09-19 fool 0.075000 0.869000 0.056 0.856000
2020-09-19 seeking_alpha 0.128143 0.810857 0.061 0.913143
2020-09-19 the verge 0.130000 0.846000 0.024 0.983300
2020-09-19 wall street journal 0.000000 1.000000 0.000 0.000000