找出哪些渠道比上周的数据涨幅超过10%
To identify what are the channels that increase more than 10% against the data of last week
我有一个跨越不同时间戳的大型数据框。这是我的尝试:
all_data = []
for ws in wb.worksheets():
rows=ws.get_all_values()
df_all_data=pd.DataFrame.from_records(rows[1:],columns=rows[0])
all_data.append(df_all_data)
data = pd.concat(all_data)
#Change data type
data['Year'] = pd.DatetimeIndex(data['Week']).year
data['Month'] = pd.DatetimeIndex(data['Week']).month
data['Week'] = pd.to_datetime(data['Week']).dt.date
data['Application'] = data['Application'].astype('str')
data['Function'] = data['Function'].astype('str')
data['Service'] = data['Service'].astype('str')
data['Channel'] = data['Channel'].astype('str')
data['Times of alarms'] = data['Times of alarms'].astype('int')
#Compare Channel values over weeks
subchannel_df = data.pivot_table('Times of alarms', index = 'Week', columns='Channel', aggfunc='sum').fillna(0)
subchannel_df = subchannel_df.sort_index(axis=1)
The data frame I am working on
我希望达到的目标:
- 在数据框末尾添加一个百分比行(最后一行与倒数第二行),排除以下情况:除以零和负百分比
- 显示与上周相比增长超过 10% 的频道。
几天来我一直在尝试不同的方法来实现这些目标。但是,我不会设法做到这一点。提前谢谢你。
您可以使用 shift 函数作为 SQL 中的 Lag window 函数等效于 return 上周的值,然后在行级别执行计算。为避免被零除,您可以使用 numpy where 函数,该函数等同于 SQL 中的 CASE WHEN。假设您执行名为“X”
的计算的列值
subchannel_df["XLag"] = subchannel_df["X"].shift(periods=1).fillna(0).astype('int')
subchannel_df["ChangePercentage"] = np.where(subchannel_df["XLag"] == 0, 0, (subchannel_df["X"]-subchannel_df["XLag"])/subchannel_df["XLag"])
subchannel_df["ChangePercentage"] = (subchannel_df["ChangePercentage"]*100).round().astype("int")
subchannel_df[subchannel_df["ChangePercentage"]>10]
输出:
Channel X XLag ChangePercentage
Week
2020-06-12 12 5 140
2020-11-15 15 10 50
2020-11-22 20 15 33
2020-12-13 27 16 69
2020-12-20 100 27 270
我有一个跨越不同时间戳的大型数据框。这是我的尝试:
all_data = []
for ws in wb.worksheets():
rows=ws.get_all_values()
df_all_data=pd.DataFrame.from_records(rows[1:],columns=rows[0])
all_data.append(df_all_data)
data = pd.concat(all_data)
#Change data type
data['Year'] = pd.DatetimeIndex(data['Week']).year
data['Month'] = pd.DatetimeIndex(data['Week']).month
data['Week'] = pd.to_datetime(data['Week']).dt.date
data['Application'] = data['Application'].astype('str')
data['Function'] = data['Function'].astype('str')
data['Service'] = data['Service'].astype('str')
data['Channel'] = data['Channel'].astype('str')
data['Times of alarms'] = data['Times of alarms'].astype('int')
#Compare Channel values over weeks
subchannel_df = data.pivot_table('Times of alarms', index = 'Week', columns='Channel', aggfunc='sum').fillna(0)
subchannel_df = subchannel_df.sort_index(axis=1)
The data frame I am working on
我希望达到的目标:
- 在数据框末尾添加一个百分比行(最后一行与倒数第二行),排除以下情况:除以零和负百分比
- 显示与上周相比增长超过 10% 的频道。
几天来我一直在尝试不同的方法来实现这些目标。但是,我不会设法做到这一点。提前谢谢你。
您可以使用 shift 函数作为 SQL 中的 Lag window 函数等效于 return 上周的值,然后在行级别执行计算。为避免被零除,您可以使用 numpy where 函数,该函数等同于 SQL 中的 CASE WHEN。假设您执行名为“X”
的计算的列值subchannel_df["XLag"] = subchannel_df["X"].shift(periods=1).fillna(0).astype('int')
subchannel_df["ChangePercentage"] = np.where(subchannel_df["XLag"] == 0, 0, (subchannel_df["X"]-subchannel_df["XLag"])/subchannel_df["XLag"])
subchannel_df["ChangePercentage"] = (subchannel_df["ChangePercentage"]*100).round().astype("int")
subchannel_df[subchannel_df["ChangePercentage"]>10]
输出:
Channel X XLag ChangePercentage
Week
2020-06-12 12 5 140
2020-11-15 15 10 50
2020-11-22 20 15 33
2020-12-13 27 16 69
2020-12-20 100 27 270