按组查看趋势变化(python pandas 数据框)
see trend change by group (python pandas dataframe)
我正在尝试根据下面现有的数据框创建一个新的数据框。我的目标是计算点击次数的平均变化并相应地对活动进行分类。
现有数据框 df:
campaign | date | clicks
A 2015-10-11 255
A 2015-10-12 367
A 2015-10-13 489
B 2015-10-11 500
B 2015-10-15 122
C 2015-10-11 33
目标数据框df_categorized:
campaign | avg_change | category
A 0.3858 increasing
B -0.756 decreasing
C 0 no change
我试过这段代码,但收到错误消息 TypeError: 'long' object does not support item assignment
#standard packages
import pandas as pd
import numpy as np
#upload data into df
df = pd.read_csv('C:\Users\xxx\Documents\ad_table.csv')
df.head()
campaign | date | clicks
A 2015-10-11 255
A 2015-10-12 367
A 2015-10-13 489
B 2015-10-11 500
B 2015-10-15 122
C 2015-10-11 33
#create empty dataframe
columns = ['group','avg_change', 'category']
df_categorized = pd.DataFrame(columns=columns)
df_categorized['avg change'] = df.clicks.apply(lambda df: df.pct_change().abs().mean())
#create column
df_categorized['category'] = 0
# going up
df_categorized['category'][df_categorized['avg change'] > 0] = "increasing"
# going down
df_categorized['category'][df_categorized['avg change'] < 0] = "decreasing"
#no change
df_categorized['category'][df_categorized['avg change'] = 0] = "no change"
你可以groupby
on 'campaign' and then apply
a lambda
that calcs the pct_change
and return the mean
. Then you can reset_index
on this and add you additional category column using np.where
:
In [239]:
gp = df.groupby('campaign')['clicks'].apply(lambda x: x.pct_change().mean()).reset_index(name='avg_change').fillna(0)
gp['category'] = np.where(gp['avg_change'] < 0, 'decreasing', np.where(gp['avg_change'] > 0, 'increasing', 'no change'))
gp
Out[239]:
campaign avg_change category
0 A 0.38582 increasing
1 B -0.75600 decreasing
2 C 0.00000 no change
这个:
df_categorized['avg change'] = df.clicks.apply(lambda df: df.pct_change().abs().mean())
不会工作,你在列上调用 apply
所以 lambda 将是每个行元素,在这种情况下是 int
因此你得到错误:
AttributeError: 'int' object has no attribute 'pct_change'
即使没有这个,它也不会给你每个广告系列 pct_change。
也不要像这样对你的 df 进行链式调用:
df_categorized['category'][df_categorized['avg change'] > 0] = "increasing"
应该是:
df_categorized.loc[df_categorized['avg change'] > 0, 'category'] = "increasing"
见docs
我正在尝试根据下面现有的数据框创建一个新的数据框。我的目标是计算点击次数的平均变化并相应地对活动进行分类。
现有数据框 df:
campaign | date | clicks
A 2015-10-11 255
A 2015-10-12 367
A 2015-10-13 489
B 2015-10-11 500
B 2015-10-15 122
C 2015-10-11 33
目标数据框df_categorized:
campaign | avg_change | category
A 0.3858 increasing
B -0.756 decreasing
C 0 no change
我试过这段代码,但收到错误消息 TypeError: 'long' object does not support item assignment
#standard packages
import pandas as pd
import numpy as np
#upload data into df
df = pd.read_csv('C:\Users\xxx\Documents\ad_table.csv')
df.head()
campaign | date | clicks
A 2015-10-11 255
A 2015-10-12 367
A 2015-10-13 489
B 2015-10-11 500
B 2015-10-15 122
C 2015-10-11 33
#create empty dataframe
columns = ['group','avg_change', 'category']
df_categorized = pd.DataFrame(columns=columns)
df_categorized['avg change'] = df.clicks.apply(lambda df: df.pct_change().abs().mean())
#create column
df_categorized['category'] = 0
# going up
df_categorized['category'][df_categorized['avg change'] > 0] = "increasing"
# going down
df_categorized['category'][df_categorized['avg change'] < 0] = "decreasing"
#no change
df_categorized['category'][df_categorized['avg change'] = 0] = "no change"
你可以groupby
on 'campaign' and then apply
a lambda
that calcs the pct_change
and return the mean
. Then you can reset_index
on this and add you additional category column using np.where
:
In [239]:
gp = df.groupby('campaign')['clicks'].apply(lambda x: x.pct_change().mean()).reset_index(name='avg_change').fillna(0)
gp['category'] = np.where(gp['avg_change'] < 0, 'decreasing', np.where(gp['avg_change'] > 0, 'increasing', 'no change'))
gp
Out[239]:
campaign avg_change category
0 A 0.38582 increasing
1 B -0.75600 decreasing
2 C 0.00000 no change
这个:
df_categorized['avg change'] = df.clicks.apply(lambda df: df.pct_change().abs().mean())
不会工作,你在列上调用 apply
所以 lambda 将是每个行元素,在这种情况下是 int
因此你得到错误:
AttributeError: 'int' object has no attribute 'pct_change'
即使没有这个,它也不会给你每个广告系列 pct_change。
也不要像这样对你的 df 进行链式调用:
df_categorized['category'][df_categorized['avg change'] > 0] = "increasing"
应该是:
df_categorized.loc[df_categorized['avg change'] > 0, 'category'] = "increasing"
见docs