Python Pandas 透视 Table 使用 7 天频率分组的日期列

Python Pandas Pivot Table Groupby Date Columns Using 7-Day Frequency

使用 Python 3.4 和 Pandas,我的枢轴 table 看起来像这样:

             Impressions                                              
Day           2015-07-06 2015-07-07 2015-07-08 2015-07-09 2015-07-10 2015-07-11 2015-07-12 2015-07-13 2015-07-14 2015-07-15 2015-07-16 2015-07-17 2015-07-18 2015-07-19   
Keyword                                                                
home brewing        1098       1323       2116       2574       1484       1533       1782       1615       1866       1936       1331       1274       1193       1483

通过使用此代码:

import pandas as pd
import numpy as np
from io import StringIO

data = StringIO('''Day  Keyword Impressions Clicks  Cost    Avg. position   Converted clicks
7/9/2015    "home brewing"  2571    6   4.13    3.1 0
7/8/2015    "home brewing"  2113    13  10.02   3.1 1
7/15/2015   "home brewing"  1933    9   9.3 2.8 0
7/14/2015   "home brewing"  1865    3   2.64    2.6 0
7/12/2015   "home brewing"  1781    7   4.93    2.6 0
7/13/2015   "home brewing"  1612    10  9.67    2.6 0
7/11/2015   "home brewing"  1530    9   9.23    2.6 0
7/10/2015   "home brewing"  1482    4   3.73    2.8 0
7/19/2015   "home brewing"  1482    5   3.26    2.5 0
7/16/2015   "home brewing"  1329    6   5.72    2.9 0
7/7/2015    "home brewing"  1318    3   2.55    2.7 0
7/17/2015   "home brewing"  1272    6   5.42    2.7 0
7/18/2015   "home brewing"  1192    5   4.5 2.5 0
7/6/2015    "home brewing"  1095    8   6.02    2.9 0
7/7/2015    "home brewing"  5   1   0.61    4   0
7/6/2015    "home brewing"  3   0   0   3.3 0
7/8/2015    "home brewing"  3   1   0.61    3.3 0
7/9/2015    "home brewing"  3   0   0   4.3 0
7/13/2015   "home brewing"  3   0   0   2.7 0
7/11/2015   "home brewing"  3   0   0   3.3 0
7/15/2015   "home brewing"  3   0   0   6.3 0
7/10/2015   "home brewing"  2   0   0   4.5 0
7/16/2015   "home brewing"  2   1   0.56    2.5 0
7/17/2015   "home brewing"  2   0   0   4   0
7/12/2015   "home brewing"  1   0   0   2   0
7/14/2015   "home brewing"  1   0   0   7   0
7/18/2015   "home brewing"  1   0   0   2   0
7/19/2015   "home brewing"  1   0   0   4   0''')

df = pd.DataFrame.from_csv(data, sep='\t')
df = df.reset_index()
pt = pd.pivot_table(df, values=['Impressions'], index=['Keyword'], columns=['Day'], aggfunc='sum')

print(pt)

我想要做的是使用 7 天 frequencyDay 列分组以获得如下所示的 summed 枢轴 table:

             Impressions
Day           2015-07-06 2015-07-13
Keyword
home brewing        11910       10698

一种方法是使用 pd.Series.dt 得到 weekofyear 并根据该列进行数据透视。

import pandas as pd
import numpy as np

# simulate your data
# ===================================
np.random.seed(0)
day = np.random.choice(pd.date_range('2015-07-01', '2015-07-31', freq='D'), size = 100)
impressions = np.random.randint(1, 1000, size=100)
keyword_str = ['home brewing'] * 100
df = pd.DataFrame(dict(Day=day, Keyword=keyword_str, Impressions=impressions))
df

          Day  Impressions       Keyword
0  2015-07-13          204  home brewing
1  2015-07-16          325  home brewing
2  2015-07-22          775  home brewing
3  2015-07-01          965  home brewing
4  2015-07-04           48  home brewing
5  2015-07-28          640  home brewing
6  2015-07-04          132  home brewing
7  2015-07-08          973  home brewing
..        ...          ...           ...
92 2015-07-01          287  home brewing
93 2015-07-15          281  home brewing
94 2015-07-04          638  home brewing
95 2015-07-22          771  home brewing
96 2015-07-13          516  home brewing
97 2015-07-26           95  home brewing
98 2015-07-11          227  home brewing
99 2015-07-21          876  home brewing

[100 rows x 3 columns]

# processing
# ===================================
df['week_of_year'] = df['Day'].dt.weekofyear

          Day  Impressions       Keyword  week_of_year
0  2015-07-13          204  home brewing            29
1  2015-07-16          325  home brewing            29
2  2015-07-22          775  home brewing            30
3  2015-07-01          965  home brewing            27
4  2015-07-04           48  home brewing            27
5  2015-07-28          640  home brewing            31
6  2015-07-04          132  home brewing            27
7  2015-07-08          973  home brewing            28
..        ...          ...           ...           ...
92 2015-07-01          287  home brewing            27
93 2015-07-15          281  home brewing            29
94 2015-07-04          638  home brewing            27
95 2015-07-22          771  home brewing            30
96 2015-07-13          516  home brewing            29
97 2015-07-26           95  home brewing            30
98 2015-07-11          227  home brewing            28
99 2015-07-21          876  home brewing            30



pd.pivot_table(df, index='Keyword', columns='week_of_year', values='Impressions', aggfunc=sum)

week_of_year    27     28    29     30    31
Keyword                                     
home brewing  9656  10934  9419  14519  4320

更新:

df.set_index('Day').groupby('Keyword').resample('7D', how=sum).reset_index().pivot(index='Keyword', columns='Day', values='Impressions')

Day           2015-07-01  2015-07-08  2015-07-15  2015-07-22  2015-07-29
Keyword                                                                 
home brewing       13450        9377       13191       10422        2408

我选择了 Jianxun Li 的答案作为正确答案,但只是想 post 我的代码加上注释,因为我确定我会在忘记如何操作时自己重新访问它。谢谢建勋!

import pandas as pd
import numpy as np
import scipy.stats as sp
from io import StringIO

data = StringIO('''Day  Keyword Impressions Clicks  Cost    Avg. position   Converted clicks
7/9/2015    "home brewing"  2571    6   4.13    3.1 0
7/8/2015    "home brewing"  2113    13  10.02   3.1 1
7/15/2015   "home brewing"  1933    9   9.3 2.8 0
7/14/2015   "home brewing"  1865    3   2.64    2.6 0
7/12/2015   "home brewing"  1781    7   4.93    2.6 0
7/13/2015   "home brewing"  1612    10  9.67    2.6 0
7/11/2015   "home brewing"  1530    9   9.23    2.6 0
7/10/2015   "home brewing"  1482    4   3.73    2.8 0
7/19/2015   "home brewing"  1482    5   3.26    2.5 0
7/16/2015   "home brewing"  1329    6   5.72    2.9 0
7/7/2015    "home brewing"  1318    3   2.55    2.7 0
7/17/2015   "home brewing"  1272    6   5.42    2.7 0
7/18/2015   "home brewing"  1192    5   4.5 2.5 0
7/6/2015    "home brewing"  1095    8   6.02    2.9 0
7/7/2015    "home brewing"  5   1   0.61    4   0
7/6/2015    "home brewing"  3   0   0   3.3 0
7/8/2015    "home brewing"  3   1   0.61    3.3 0
7/9/2015    "home brewing"  3   0   0   4.3 0
7/13/2015   "home brewing"  3   0   0   2.7 0
7/11/2015   "home brewing"  3   0   0   3.3 0
7/15/2015   "home brewing"  3   0   0   6.3 0
7/10/2015   "home brewing"  2   0   0   4.5 0
7/16/2015   "home brewing"  2   1   0.56    2.5 0
7/17/2015   "home brewing"  2   0   0   4   0
7/12/2015   "home brewing"  1   0   0   2   0
7/14/2015   "home brewing"  1   0   0   7   0
7/18/2015   "home brewing"  1   0   0   2   0
7/19/2015   "home brewing"  1   0   0   4   0''')

#Read data into dataframe
df = pd.DataFrame.from_csv(data, sep='\t', index_col=None)
#Drop unneeded columns
df = df.drop(['Clicks', 'Cost', 'Converted clicks', 'Avg. position'], axis=1)
#set 'Day' to a datetime dtype
df['Day'] = pd.to_datetime(df['Day'])
#Set index to be 'Day'
df = df.set_index('Day')
#Group by keyword
df = df.groupby('Keyword')
#Resample the index by 7 days and sum
df = df.resample('7D', how=sum)
'''df looks like this currently...
                         Impressions
Keyword      Day                    
home brewing 2015-07-06        11910
             2015-07-13        10698
'''
#Reset the index now that date is grouped
df = df.reset_index()
'''
        Keyword        Day  Impressions
0  home brewing 2015-07-06        11910
1  home brewing 2015-07-13        10698
'''
#This part pivots the data to have 'Day' be columns
df = df.pivot(index='Keyword', columns='Day', values='Impressions')

print(df)
''' #End Result#
Day           2015-07-06  2015-07-13
Keyword                             
home brewing       11910       10698
'''