Pandas 数据框中日期之间的差异

Difference between dates in Pandas dataframe

这是 ,但现在我需要找出存储在 'YYYY-MM-DD' 中的日期之间的差异。 count 列中值之间的差异本质上是我们所需要的,但由每行之间的天数标准化。

我的数据框是:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,58.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,531.0
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,533.0
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,534.0

我想找出按 date+site+country+kind+ID 元组分组后每个日期之间的差异。

[date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count,day_diff
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0,0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0,1
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0,1
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,0,1
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0,1
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0,1
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,4,2
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0,0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3,1
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,7,4
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,3,1
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,1,1]

一种选择是使用 pd.to_datetime()date 列转换为 Pandas datetime 列并使用 diff 函数,但这会导致timetelda64 类型的“x days”的值。我想使用这个差异来找到每日平均计数,所以如果这可以在一个 single/less 痛苦的步骤中完成,那将很有效。

您可以使用 .dt.days 访问器:

In [72]: df['date'] = pd.to_datetime(df['date'])

In [73]: df['day_diff'] = df.groupby(['site','country_code','kind','ID'])['date'] \
                            .diff().dt.days.fillna(0)

In [74]: df
Out[74]:
         date      site country_code  kind  ID  rank  votes  sessions  avg_score  count  day_diff
0  2017-03-20  website1           US     0  84   226    0.0      15.0   3.370812   53.0       0.0
1  2017-03-21  website1           US     0  84   214    0.0      15.0   3.370812   53.0       1.0
2  2017-03-22  website1           US     0  84   226    0.0      16.0   3.370812   53.0       1.0
3  2017-03-23  website1           US     0  84   234    0.0      16.0   3.369048   54.0       1.0
4  2017-03-24  website1           US     0  84   226    0.0      16.0   3.369048   54.0       1.0
5  2017-03-25  website1           US     0  84   212    0.0      16.0   3.369048   54.0       1.0
6  2017-03-27  website1           US     0  84   228    0.0      16.0   3.369048   58.0       2.0
7  2017-02-15  website2           AU     1  91   144    4.0     148.0   4.727272  521.0       0.0
8  2017-02-16  website2           AU     1  91   144    3.0     147.0   4.727272  524.0       1.0
9  2017-02-20  website2           AU     1  91   100    4.0     148.0   4.727272  531.0       4.0
10 2017-02-21  website2           AU     1  91   118    6.0     149.0   4.727272  533.0       1.0
11 2017-02-22  website2           AU     1  91   114    4.0     151.0   4.727272  534.0       1.0