使用两级 groupby 计算随时间推移的累积次数
Counting culmulative occurences over time with a two-level groupby
我有一个如下所示的数据集:
country date_added
0 United States 01/2013
1 United Kingdom 03/2014
2 Egypt 03/2014
3 United States 03/2014
4 United States 03/2014
5 United Kingdom 06/2015
6 United States 06/2015
我想要 运行 每个国家/地区按日期的累计总数,即:
date_added country cumulative_count
0 01/2013 United States 1
1 03/2014 United Kingdom 1
2 03/2014 Egypt 1
3 03/2014 United States 2
4 06/2015 United Kingdom 2
5 06/2015 United States 4
我试过 grouping by two levels 但 .count() 不起作用(计数根本不显示)而 .size() 确实:
cumulative_by_date = new_df.groupby(['date_added','country']).size()
我不知道如何应用 this question's solution 和 .size() 来获得累计和。
按照第二个链接问题的方法,这里有一个带有 cumsum
和 reset_index
的双 groupby
:
df.groupby(['date_added', 'country']).size()
.groupby(['country']).cumsum().reset_index(name='cumulative_count')
输出:
date_added country cumulative_count
0 01/2013 United States 1
1 03/2014 Egypt 1
2 03/2014 United Kingdom 1
3 03/2014 United States 3
4 06/2015 United Kingdom 2
5 06/2015 United States 4
步骤:
# size by date and country
print(df.groupby(['date_added', 'country']).size())
# output
date_added country
01/2013 United States 1
03/2014 Egypt 1
United Kingdom 1
United States 2
06/2015 United Kingdom 1
United States 1
# cumulative sum by country
print(df.groupby(['date_added', 'country']).size()
.groupby(['country']).cumsum())
# output
date_added country
01/2013 United States 1
03/2014 Egypt 1
United Kingdom 1
United States 3
06/2015 United Kingdom 2
United States 4
# reset index
print(df.groupby(['date_added', 'country']).size()
.groupby(['country']).cumsum().reset_index(name='cumulative_count'))
# output
date_added country cumulative_count
0 01/2013 United States 1
1 03/2014 Egypt 1
2 03/2014 United Kingdom 1
3 03/2014 United States 3
4 06/2015 United Kingdom 2
5 06/2015 United States 4
我有一个如下所示的数据集:
country date_added
0 United States 01/2013
1 United Kingdom 03/2014
2 Egypt 03/2014
3 United States 03/2014
4 United States 03/2014
5 United Kingdom 06/2015
6 United States 06/2015
我想要 运行 每个国家/地区按日期的累计总数,即:
date_added country cumulative_count
0 01/2013 United States 1
1 03/2014 United Kingdom 1
2 03/2014 Egypt 1
3 03/2014 United States 2
4 06/2015 United Kingdom 2
5 06/2015 United States 4
我试过 grouping by two levels 但 .count() 不起作用(计数根本不显示)而 .size() 确实:
cumulative_by_date = new_df.groupby(['date_added','country']).size()
我不知道如何应用 this question's solution 和 .size() 来获得累计和。
按照第二个链接问题的方法,这里有一个带有 cumsum
和 reset_index
的双 groupby
:
df.groupby(['date_added', 'country']).size()
.groupby(['country']).cumsum().reset_index(name='cumulative_count')
输出:
date_added country cumulative_count
0 01/2013 United States 1
1 03/2014 Egypt 1
2 03/2014 United Kingdom 1
3 03/2014 United States 3
4 06/2015 United Kingdom 2
5 06/2015 United States 4
步骤:
# size by date and country
print(df.groupby(['date_added', 'country']).size())
# output
date_added country
01/2013 United States 1
03/2014 Egypt 1
United Kingdom 1
United States 2
06/2015 United Kingdom 1
United States 1
# cumulative sum by country
print(df.groupby(['date_added', 'country']).size()
.groupby(['country']).cumsum())
# output
date_added country
01/2013 United States 1
03/2014 Egypt 1
United Kingdom 1
United States 3
06/2015 United Kingdom 2
United States 4
# reset index
print(df.groupby(['date_added', 'country']).size()
.groupby(['country']).cumsum().reset_index(name='cumulative_count'))
# output
date_added country cumulative_count
0 01/2013 United States 1
1 03/2014 Egypt 1
2 03/2014 United Kingdom 1
3 03/2014 United States 3
4 06/2015 United Kingdom 2
5 06/2015 United States 4