R dplyr:创建面板数据的理货
R dplyr: tally to create Panel Data
我有以下数据框:
D <- data.frame("Id" = c("a","b","c","d","e","f","g"), "Group" = c("1","1","1","2","2","2","2"),"Time" = c("1","1","2","1","2","3","3"))
Id Group Time
1 a 1 1
2 b 1 1
3 c 1 2
4 d 2 1
5 e 2 2
6 f 2 3
7 g 2 3
我想按周期和时间统计个体的数量,保持圆柱结构。经典的方法是使用 dplyr
D %>% group_by(Group,Time) %>% tally()
Group Time n
<fct> <fct> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 1
5 2 3 2
但结构不平衡:时间 3 没有出现在第 1 组中,但我希望看到它与 0 相关联,如下所示:
Group Time n
<fct> <fct> <int>
1 1 1 2
2 1 2 1
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 2
有没有办法平衡 group_by 后的结果?有人遇到过类似的事情吗?
提前致谢
由于 Time
是因子变量,我们可以将 count
与 .drop = FALSE
一起使用,因为默认情况下 count
会丢弃计数为 0 的观察值。
library(dplyr)
D %>% count(Group, Time, .drop = FALSE)
# Group Time n
# <fct> <fct> <int>
#1 1 1 2
#2 1 2 1
#3 1 3 0
#4 2 1 1
#5 2 2 1
#6 2 3 2
我们也可以使用相同的方法使用 tally
。
D %>% group_by(Group,Time, .drop = FALSE) %>% tally()
或 complete
D %>% count(Group, Time) %>% tidyr::complete(Group, Time, fill = list(n = 0))
Ronak Shah 回答的小替代方案:
library(tidyr)
library(dplyr)
D <- data.frame("Id" = c("a","b","c","d","e","f","g"), "Group" = c("1","1","1","2","2","2","2"),"Time" = c("1","1","2","1","2","3","3"))
D %>%
group_by(Group,Time) %>%
tally() %>%
ungroup() %>%
complete(Group, Time)
在base R
中,我们可以使用table
as.data.frame(table(D[-1]))
我有以下数据框:
D <- data.frame("Id" = c("a","b","c","d","e","f","g"), "Group" = c("1","1","1","2","2","2","2"),"Time" = c("1","1","2","1","2","3","3"))
Id Group Time
1 a 1 1
2 b 1 1
3 c 1 2
4 d 2 1
5 e 2 2
6 f 2 3
7 g 2 3
我想按周期和时间统计个体的数量,保持圆柱结构。经典的方法是使用 dplyr
D %>% group_by(Group,Time) %>% tally()
Group Time n
<fct> <fct> <int>
1 1 1 2
2 1 2 1
3 2 1 1
4 2 2 1
5 2 3 2
但结构不平衡:时间 3 没有出现在第 1 组中,但我希望看到它与 0 相关联,如下所示:
Group Time n
<fct> <fct> <int>
1 1 1 2
2 1 2 1
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 2
有没有办法平衡 group_by 后的结果?有人遇到过类似的事情吗? 提前致谢
由于 Time
是因子变量,我们可以将 count
与 .drop = FALSE
一起使用,因为默认情况下 count
会丢弃计数为 0 的观察值。
library(dplyr)
D %>% count(Group, Time, .drop = FALSE)
# Group Time n
# <fct> <fct> <int>
#1 1 1 2
#2 1 2 1
#3 1 3 0
#4 2 1 1
#5 2 2 1
#6 2 3 2
我们也可以使用相同的方法使用 tally
。
D %>% group_by(Group,Time, .drop = FALSE) %>% tally()
或 complete
D %>% count(Group, Time) %>% tidyr::complete(Group, Time, fill = list(n = 0))
Ronak Shah 回答的小替代方案:
library(tidyr)
library(dplyr)
D <- data.frame("Id" = c("a","b","c","d","e","f","g"), "Group" = c("1","1","1","2","2","2","2"),"Time" = c("1","1","2","1","2","3","3"))
D %>%
group_by(Group,Time) %>%
tally() %>%
ungroup() %>%
complete(Group, Time)
在base R
中,我们可以使用table
as.data.frame(table(D[-1]))