如何使用 pandas 将每日数据值转换为最新可用日历周内天数的百分比差异?

How to turn daily data values into % difference for the days within the latest available calendar week with pandas?

我有一个 CSV 文件,其中包含过去 30 天采用以下格式的每日数据。但是,如果最近添加了特定 ID,则预计它的行数较少(请参阅 ID=2,数据仅为 2 天):

my.csv

date          ID    Name    Value1    Value2    Value3
07-09-2020    1     ACME     111       3000      123
08-09-2020    1     ACME     222       2500      345
09-09-2020    1     ACME     333       4500      456
10-09-2020    1     ACME     444       1000      567
11-09-2020    1     ACME     555       9000      678
12-09-2020    1     ACME     666       400       789
13-09-2020    1     ACME     666       450       789
14-09-2020    1     ACME     666       444       789
12-09-2020    2     EMCA     111       999       123
13-09-2020    2     EMCA     222       888       345
#...

我正在寻找一个解决方案:

每个 ID 的期望输出:

ID  Name    07-09-2020  % Difference    08-09-2020  % Difference    09-09-2020  % Difference    10-09-2020  % Difference    11-09-2020  % Difference    12-09-2020  % Difference    13-09-2020  Weekly % Difference Average
1   ACME       3000       -0.166667       2500         0.8           4500          -0.777778       1000        8.0            9000         -0.955556       400          0.125000           450                  1.170833
2   EMCA       N/A         N/A             N/A          N/A           N/A           N/A             N/A         N/A            N/A           N/A           999         -0.111111           888                  -0.111111

到目前为止我的代码:

import pandas as pd
from datetime import timedelta
import datetime

data = pd.read_csv("path/to/my,csv", quotechar='"')

#generate latest full calendar week dates
today = datetime.date.today()
weekday = today.weekday()
start_delta = datetime.timedelta(days=weekday, weeks=1)
week_dates = []
for day in range(7):
    week_dates.append(start_of_week + datetime.timedelta(days=day))

#check if latest full calendar week dates are available in my.csv
# if any of the days of the week for latest calendar week is not present, then select dates for the week before this week
last_week_dates = []
for i in week_dates:
    last_week_dates.append(i.strftime("%d-%m-%Y"))

for i in last_week_dates:
    checkDates = data['date'].isin(last_week_dates)
    if any(x == False for x in checkDates):
        for i in range(7,14):
        print (today - timedelta(days=i)
    #get values from the column 'Value2' for the previous week (if last week dates are not in the file)
    #save values as columns in new dataframe
    #calculate %difference and weekly avg

else:
    #get values from the column 'Value2' for the last week
    #save values as columns in new dataframe
    #calculate %difference and weekly avg


finalData.to_csv("path/to/output.csv", index=False)

有人可以帮忙吗?提前谢谢你!

内联评论

# ensure 'date' is of <type datetime>
data['date'] = pd.to_datetime(data['date'], dayfirst=True)

# select last full calendar week
end = pd.Timestamp.today().normalize()
if end.weekday() != 6:
    end -= pd.Timedelta(days=end.weekday() + 1)
out = data.loc[
    data['date'].between(end - pd.Timedelta(days=6), end)
]
# cast back to string, to control the way it is printed
out['date'] = out['date'].dt.strftime('%d-%m-%Y')

# calculate and reshape
out = out.set_index(['date', 'ID', 'Name'])['Value2'].to_frame()
out['Difference'] = (
    out.groupby('ID').transform('pct_change')
)
out = out.unstack('date')

out.sort_index(axis=1, level='date', kind='mergesort', inplace=True)
out.dropna(axis=1, how='all', inplace=True)
out = out.swaplevel(0, 1, axis=1)

out['Weekly Difference Average'] = (
    out.loc[:, (slice(None), 'Difference')]
    .mean(axis=1)
)

输出

date    07-09-2020 08-09-2020         09-09-2020         10-09-2020          \
            Value2 Difference  Value2 Difference  Value2 Difference  Value2
ID Name
1  ACME     3000.0  -0.166667  2500.0        0.8  4500.0  -0.777778  1000.0
2  EMCA        NaN        NaN     NaN        NaN     NaN        NaN     NaN

date    11-09-2020         12-09-2020        13-09-2020         \
        Difference  Value2 Difference Value2 Difference Value2
ID Name
1  ACME        8.0  9000.0  -0.955556  400.0   0.125000  450.0
2  EMCA        NaN     NaN        NaN  999.0  -0.111111  888.0

date    Weekly Difference Average

ID Name
1  ACME                  1.170833
2  EMCA                 -0.111111

那你可以用df.to_csv().