Python Pandas 透视 Table 使用 7 天频率分组的日期列
Python Pandas Pivot Table Groupby Date Columns Using 7-Day Frequency
使用 Python 3.4 和 Pandas,我的枢轴 table 看起来像这样:
Impressions
Day 2015-07-06 2015-07-07 2015-07-08 2015-07-09 2015-07-10 2015-07-11 2015-07-12 2015-07-13 2015-07-14 2015-07-15 2015-07-16 2015-07-17 2015-07-18 2015-07-19
Keyword
home brewing 1098 1323 2116 2574 1484 1533 1782 1615 1866 1936 1331 1274 1193 1483
通过使用此代码:
import pandas as pd
import numpy as np
from io import StringIO
data = StringIO('''Day Keyword Impressions Clicks Cost Avg. position Converted clicks
7/9/2015 "home brewing" 2571 6 4.13 3.1 0
7/8/2015 "home brewing" 2113 13 10.02 3.1 1
7/15/2015 "home brewing" 1933 9 9.3 2.8 0
7/14/2015 "home brewing" 1865 3 2.64 2.6 0
7/12/2015 "home brewing" 1781 7 4.93 2.6 0
7/13/2015 "home brewing" 1612 10 9.67 2.6 0
7/11/2015 "home brewing" 1530 9 9.23 2.6 0
7/10/2015 "home brewing" 1482 4 3.73 2.8 0
7/19/2015 "home brewing" 1482 5 3.26 2.5 0
7/16/2015 "home brewing" 1329 6 5.72 2.9 0
7/7/2015 "home brewing" 1318 3 2.55 2.7 0
7/17/2015 "home brewing" 1272 6 5.42 2.7 0
7/18/2015 "home brewing" 1192 5 4.5 2.5 0
7/6/2015 "home brewing" 1095 8 6.02 2.9 0
7/7/2015 "home brewing" 5 1 0.61 4 0
7/6/2015 "home brewing" 3 0 0 3.3 0
7/8/2015 "home brewing" 3 1 0.61 3.3 0
7/9/2015 "home brewing" 3 0 0 4.3 0
7/13/2015 "home brewing" 3 0 0 2.7 0
7/11/2015 "home brewing" 3 0 0 3.3 0
7/15/2015 "home brewing" 3 0 0 6.3 0
7/10/2015 "home brewing" 2 0 0 4.5 0
7/16/2015 "home brewing" 2 1 0.56 2.5 0
7/17/2015 "home brewing" 2 0 0 4 0
7/12/2015 "home brewing" 1 0 0 2 0
7/14/2015 "home brewing" 1 0 0 7 0
7/18/2015 "home brewing" 1 0 0 2 0
7/19/2015 "home brewing" 1 0 0 4 0''')
df = pd.DataFrame.from_csv(data, sep='\t')
df = df.reset_index()
pt = pd.pivot_table(df, values=['Impressions'], index=['Keyword'], columns=['Day'], aggfunc='sum')
print(pt)
我想要做的是使用 7 天 frequency
按 Day
列分组以获得如下所示的 summed
枢轴 table:
Impressions
Day 2015-07-06 2015-07-13
Keyword
home brewing 11910 10698
一种方法是使用 pd.Series
的 .dt
得到 weekofyear
并根据该列进行数据透视。
import pandas as pd
import numpy as np
# simulate your data
# ===================================
np.random.seed(0)
day = np.random.choice(pd.date_range('2015-07-01', '2015-07-31', freq='D'), size = 100)
impressions = np.random.randint(1, 1000, size=100)
keyword_str = ['home brewing'] * 100
df = pd.DataFrame(dict(Day=day, Keyword=keyword_str, Impressions=impressions))
df
Day Impressions Keyword
0 2015-07-13 204 home brewing
1 2015-07-16 325 home brewing
2 2015-07-22 775 home brewing
3 2015-07-01 965 home brewing
4 2015-07-04 48 home brewing
5 2015-07-28 640 home brewing
6 2015-07-04 132 home brewing
7 2015-07-08 973 home brewing
.. ... ... ...
92 2015-07-01 287 home brewing
93 2015-07-15 281 home brewing
94 2015-07-04 638 home brewing
95 2015-07-22 771 home brewing
96 2015-07-13 516 home brewing
97 2015-07-26 95 home brewing
98 2015-07-11 227 home brewing
99 2015-07-21 876 home brewing
[100 rows x 3 columns]
# processing
# ===================================
df['week_of_year'] = df['Day'].dt.weekofyear
Day Impressions Keyword week_of_year
0 2015-07-13 204 home brewing 29
1 2015-07-16 325 home brewing 29
2 2015-07-22 775 home brewing 30
3 2015-07-01 965 home brewing 27
4 2015-07-04 48 home brewing 27
5 2015-07-28 640 home brewing 31
6 2015-07-04 132 home brewing 27
7 2015-07-08 973 home brewing 28
.. ... ... ... ...
92 2015-07-01 287 home brewing 27
93 2015-07-15 281 home brewing 29
94 2015-07-04 638 home brewing 27
95 2015-07-22 771 home brewing 30
96 2015-07-13 516 home brewing 29
97 2015-07-26 95 home brewing 30
98 2015-07-11 227 home brewing 28
99 2015-07-21 876 home brewing 30
pd.pivot_table(df, index='Keyword', columns='week_of_year', values='Impressions', aggfunc=sum)
week_of_year 27 28 29 30 31
Keyword
home brewing 9656 10934 9419 14519 4320
更新:
df.set_index('Day').groupby('Keyword').resample('7D', how=sum).reset_index().pivot(index='Keyword', columns='Day', values='Impressions')
Day 2015-07-01 2015-07-08 2015-07-15 2015-07-22 2015-07-29
Keyword
home brewing 13450 9377 13191 10422 2408
我选择了 Jianxun Li 的答案作为正确答案,但只是想 post 我的代码加上注释,因为我确定我会在忘记如何操作时自己重新访问它。谢谢建勋!
import pandas as pd
import numpy as np
import scipy.stats as sp
from io import StringIO
data = StringIO('''Day Keyword Impressions Clicks Cost Avg. position Converted clicks
7/9/2015 "home brewing" 2571 6 4.13 3.1 0
7/8/2015 "home brewing" 2113 13 10.02 3.1 1
7/15/2015 "home brewing" 1933 9 9.3 2.8 0
7/14/2015 "home brewing" 1865 3 2.64 2.6 0
7/12/2015 "home brewing" 1781 7 4.93 2.6 0
7/13/2015 "home brewing" 1612 10 9.67 2.6 0
7/11/2015 "home brewing" 1530 9 9.23 2.6 0
7/10/2015 "home brewing" 1482 4 3.73 2.8 0
7/19/2015 "home brewing" 1482 5 3.26 2.5 0
7/16/2015 "home brewing" 1329 6 5.72 2.9 0
7/7/2015 "home brewing" 1318 3 2.55 2.7 0
7/17/2015 "home brewing" 1272 6 5.42 2.7 0
7/18/2015 "home brewing" 1192 5 4.5 2.5 0
7/6/2015 "home brewing" 1095 8 6.02 2.9 0
7/7/2015 "home brewing" 5 1 0.61 4 0
7/6/2015 "home brewing" 3 0 0 3.3 0
7/8/2015 "home brewing" 3 1 0.61 3.3 0
7/9/2015 "home brewing" 3 0 0 4.3 0
7/13/2015 "home brewing" 3 0 0 2.7 0
7/11/2015 "home brewing" 3 0 0 3.3 0
7/15/2015 "home brewing" 3 0 0 6.3 0
7/10/2015 "home brewing" 2 0 0 4.5 0
7/16/2015 "home brewing" 2 1 0.56 2.5 0
7/17/2015 "home brewing" 2 0 0 4 0
7/12/2015 "home brewing" 1 0 0 2 0
7/14/2015 "home brewing" 1 0 0 7 0
7/18/2015 "home brewing" 1 0 0 2 0
7/19/2015 "home brewing" 1 0 0 4 0''')
#Read data into dataframe
df = pd.DataFrame.from_csv(data, sep='\t', index_col=None)
#Drop unneeded columns
df = df.drop(['Clicks', 'Cost', 'Converted clicks', 'Avg. position'], axis=1)
#set 'Day' to a datetime dtype
df['Day'] = pd.to_datetime(df['Day'])
#Set index to be 'Day'
df = df.set_index('Day')
#Group by keyword
df = df.groupby('Keyword')
#Resample the index by 7 days and sum
df = df.resample('7D', how=sum)
'''df looks like this currently...
Impressions
Keyword Day
home brewing 2015-07-06 11910
2015-07-13 10698
'''
#Reset the index now that date is grouped
df = df.reset_index()
'''
Keyword Day Impressions
0 home brewing 2015-07-06 11910
1 home brewing 2015-07-13 10698
'''
#This part pivots the data to have 'Day' be columns
df = df.pivot(index='Keyword', columns='Day', values='Impressions')
print(df)
''' #End Result#
Day 2015-07-06 2015-07-13
Keyword
home brewing 11910 10698
'''
使用 Python 3.4 和 Pandas,我的枢轴 table 看起来像这样:
Impressions
Day 2015-07-06 2015-07-07 2015-07-08 2015-07-09 2015-07-10 2015-07-11 2015-07-12 2015-07-13 2015-07-14 2015-07-15 2015-07-16 2015-07-17 2015-07-18 2015-07-19
Keyword
home brewing 1098 1323 2116 2574 1484 1533 1782 1615 1866 1936 1331 1274 1193 1483
通过使用此代码:
import pandas as pd
import numpy as np
from io import StringIO
data = StringIO('''Day Keyword Impressions Clicks Cost Avg. position Converted clicks
7/9/2015 "home brewing" 2571 6 4.13 3.1 0
7/8/2015 "home brewing" 2113 13 10.02 3.1 1
7/15/2015 "home brewing" 1933 9 9.3 2.8 0
7/14/2015 "home brewing" 1865 3 2.64 2.6 0
7/12/2015 "home brewing" 1781 7 4.93 2.6 0
7/13/2015 "home brewing" 1612 10 9.67 2.6 0
7/11/2015 "home brewing" 1530 9 9.23 2.6 0
7/10/2015 "home brewing" 1482 4 3.73 2.8 0
7/19/2015 "home brewing" 1482 5 3.26 2.5 0
7/16/2015 "home brewing" 1329 6 5.72 2.9 0
7/7/2015 "home brewing" 1318 3 2.55 2.7 0
7/17/2015 "home brewing" 1272 6 5.42 2.7 0
7/18/2015 "home brewing" 1192 5 4.5 2.5 0
7/6/2015 "home brewing" 1095 8 6.02 2.9 0
7/7/2015 "home brewing" 5 1 0.61 4 0
7/6/2015 "home brewing" 3 0 0 3.3 0
7/8/2015 "home brewing" 3 1 0.61 3.3 0
7/9/2015 "home brewing" 3 0 0 4.3 0
7/13/2015 "home brewing" 3 0 0 2.7 0
7/11/2015 "home brewing" 3 0 0 3.3 0
7/15/2015 "home brewing" 3 0 0 6.3 0
7/10/2015 "home brewing" 2 0 0 4.5 0
7/16/2015 "home brewing" 2 1 0.56 2.5 0
7/17/2015 "home brewing" 2 0 0 4 0
7/12/2015 "home brewing" 1 0 0 2 0
7/14/2015 "home brewing" 1 0 0 7 0
7/18/2015 "home brewing" 1 0 0 2 0
7/19/2015 "home brewing" 1 0 0 4 0''')
df = pd.DataFrame.from_csv(data, sep='\t')
df = df.reset_index()
pt = pd.pivot_table(df, values=['Impressions'], index=['Keyword'], columns=['Day'], aggfunc='sum')
print(pt)
我想要做的是使用 7 天 frequency
按 Day
列分组以获得如下所示的 summed
枢轴 table:
Impressions
Day 2015-07-06 2015-07-13
Keyword
home brewing 11910 10698
一种方法是使用 pd.Series
的 .dt
得到 weekofyear
并根据该列进行数据透视。
import pandas as pd
import numpy as np
# simulate your data
# ===================================
np.random.seed(0)
day = np.random.choice(pd.date_range('2015-07-01', '2015-07-31', freq='D'), size = 100)
impressions = np.random.randint(1, 1000, size=100)
keyword_str = ['home brewing'] * 100
df = pd.DataFrame(dict(Day=day, Keyword=keyword_str, Impressions=impressions))
df
Day Impressions Keyword
0 2015-07-13 204 home brewing
1 2015-07-16 325 home brewing
2 2015-07-22 775 home brewing
3 2015-07-01 965 home brewing
4 2015-07-04 48 home brewing
5 2015-07-28 640 home brewing
6 2015-07-04 132 home brewing
7 2015-07-08 973 home brewing
.. ... ... ...
92 2015-07-01 287 home brewing
93 2015-07-15 281 home brewing
94 2015-07-04 638 home brewing
95 2015-07-22 771 home brewing
96 2015-07-13 516 home brewing
97 2015-07-26 95 home brewing
98 2015-07-11 227 home brewing
99 2015-07-21 876 home brewing
[100 rows x 3 columns]
# processing
# ===================================
df['week_of_year'] = df['Day'].dt.weekofyear
Day Impressions Keyword week_of_year
0 2015-07-13 204 home brewing 29
1 2015-07-16 325 home brewing 29
2 2015-07-22 775 home brewing 30
3 2015-07-01 965 home brewing 27
4 2015-07-04 48 home brewing 27
5 2015-07-28 640 home brewing 31
6 2015-07-04 132 home brewing 27
7 2015-07-08 973 home brewing 28
.. ... ... ... ...
92 2015-07-01 287 home brewing 27
93 2015-07-15 281 home brewing 29
94 2015-07-04 638 home brewing 27
95 2015-07-22 771 home brewing 30
96 2015-07-13 516 home brewing 29
97 2015-07-26 95 home brewing 30
98 2015-07-11 227 home brewing 28
99 2015-07-21 876 home brewing 30
pd.pivot_table(df, index='Keyword', columns='week_of_year', values='Impressions', aggfunc=sum)
week_of_year 27 28 29 30 31
Keyword
home brewing 9656 10934 9419 14519 4320
更新:
df.set_index('Day').groupby('Keyword').resample('7D', how=sum).reset_index().pivot(index='Keyword', columns='Day', values='Impressions')
Day 2015-07-01 2015-07-08 2015-07-15 2015-07-22 2015-07-29
Keyword
home brewing 13450 9377 13191 10422 2408
我选择了 Jianxun Li 的答案作为正确答案,但只是想 post 我的代码加上注释,因为我确定我会在忘记如何操作时自己重新访问它。谢谢建勋!
import pandas as pd
import numpy as np
import scipy.stats as sp
from io import StringIO
data = StringIO('''Day Keyword Impressions Clicks Cost Avg. position Converted clicks
7/9/2015 "home brewing" 2571 6 4.13 3.1 0
7/8/2015 "home brewing" 2113 13 10.02 3.1 1
7/15/2015 "home brewing" 1933 9 9.3 2.8 0
7/14/2015 "home brewing" 1865 3 2.64 2.6 0
7/12/2015 "home brewing" 1781 7 4.93 2.6 0
7/13/2015 "home brewing" 1612 10 9.67 2.6 0
7/11/2015 "home brewing" 1530 9 9.23 2.6 0
7/10/2015 "home brewing" 1482 4 3.73 2.8 0
7/19/2015 "home brewing" 1482 5 3.26 2.5 0
7/16/2015 "home brewing" 1329 6 5.72 2.9 0
7/7/2015 "home brewing" 1318 3 2.55 2.7 0
7/17/2015 "home brewing" 1272 6 5.42 2.7 0
7/18/2015 "home brewing" 1192 5 4.5 2.5 0
7/6/2015 "home brewing" 1095 8 6.02 2.9 0
7/7/2015 "home brewing" 5 1 0.61 4 0
7/6/2015 "home brewing" 3 0 0 3.3 0
7/8/2015 "home brewing" 3 1 0.61 3.3 0
7/9/2015 "home brewing" 3 0 0 4.3 0
7/13/2015 "home brewing" 3 0 0 2.7 0
7/11/2015 "home brewing" 3 0 0 3.3 0
7/15/2015 "home brewing" 3 0 0 6.3 0
7/10/2015 "home brewing" 2 0 0 4.5 0
7/16/2015 "home brewing" 2 1 0.56 2.5 0
7/17/2015 "home brewing" 2 0 0 4 0
7/12/2015 "home brewing" 1 0 0 2 0
7/14/2015 "home brewing" 1 0 0 7 0
7/18/2015 "home brewing" 1 0 0 2 0
7/19/2015 "home brewing" 1 0 0 4 0''')
#Read data into dataframe
df = pd.DataFrame.from_csv(data, sep='\t', index_col=None)
#Drop unneeded columns
df = df.drop(['Clicks', 'Cost', 'Converted clicks', 'Avg. position'], axis=1)
#set 'Day' to a datetime dtype
df['Day'] = pd.to_datetime(df['Day'])
#Set index to be 'Day'
df = df.set_index('Day')
#Group by keyword
df = df.groupby('Keyword')
#Resample the index by 7 days and sum
df = df.resample('7D', how=sum)
'''df looks like this currently...
Impressions
Keyword Day
home brewing 2015-07-06 11910
2015-07-13 10698
'''
#Reset the index now that date is grouped
df = df.reset_index()
'''
Keyword Day Impressions
0 home brewing 2015-07-06 11910
1 home brewing 2015-07-13 10698
'''
#This part pivots the data to have 'Day' be columns
df = df.pivot(index='Keyword', columns='Day', values='Impressions')
print(df)
''' #End Result#
Day 2015-07-06 2015-07-13
Keyword
home brewing 11910 10698
'''