按组添加行
Add rows by groups
我观察不同日期(date2)人群中不同字符(code)的频率(eff)。
datas <- data.frame(date2 = rep(seq(Sys.Date() - 2, Sys.Date(), by = "day"), each = 2),
date1 = Sys.Date(),
code = rep(LETTERS[1:2], 3),
eff = c(50, 30, 20, 10, 20, 20),
total = 100)
> datas
date2 date1 code eff total
1 2015-07-25 2015-07-27 A 50 100
2 2015-07-25 2015-07-27 B 30 100
3 2015-07-26 2015-07-27 A 20 100
4 2015-07-26 2015-07-27 B 10 100
5 2015-07-27 2015-07-27 A 20 100
6 2015-07-27 2015-07-27 B 20 100
对于每个 date2,我想添加另一个代码,其中 "fill" 当天的频率总和与当天观察到的总人口(总计)之间的差异。
例如,这是我期望的输出:
date2 eff date1 total code
1 2015-07-25 50 2015-07-27 100 A
2 2015-07-25 30 2015-07-27 100 B
3 2015-07-25 20 2015-07-27 100 KO
4 2015-07-26 20 2015-07-27 100 A
5 2015-07-26 10 2015-07-27 100 B
6 2015-07-26 70 2015-07-27 100 KO
7 2015-07-27 20 2015-07-27 100 A
8 2015-07-27 20 2015-07-27 100 B
9 2015-07-27 60 2015-07-27 100 KO
这就是我生成输出的方式:
datas %>%
group_by(date2) %>%
summarise(eff = (sum(total) / n()) - sum(eff)) %>%
inner_join(datas, by = "date2") %>%
select(-c(eff.y, code), eff = eff.x) %>%
distinct %>%
mutate(code = "KO") %>%
bind_rows(datas)
但我不喜欢这个解决方案,我想知道是否有人有更好的解决方案!
另外,您将如何处理 2 个分组变量(下面示例中的日期 1 和日期 2)?
datas2 <- data.frame(date2 = c(rep(seq(Sys.Date() - 2, Sys.Date(), by = "day"), each = 2),
rep(seq(Sys.Date() - 1, Sys.Date(), by = "day"), each = 2)),
date1 = c(rep(Sys.Date() - 3, 6), rep(Sys.Date() - 2, 4)),
code = c(rep(LETTERS[1:2], 3), rep(LETTERS[1:2], 2)),
eff = c(50, 30, 20, 10, 20, 20, 10, 20, 30, 40),
total = 100)
> datas2
date2 date1 code eff total
1 2015-07-25 2015-07-24 A 50 100
2 2015-07-25 2015-07-24 B 30 100
3 2015-07-26 2015-07-24 A 20 100
4 2015-07-26 2015-07-24 B 10 100
5 2015-07-27 2015-07-24 A 20 100
6 2015-07-27 2015-07-24 B 20 100
7 2015-07-26 2015-07-25 A 10 100
8 2015-07-26 2015-07-25 B 20 100
9 2015-07-27 2015-07-25 A 30 100
10 2015-07-27 2015-07-25 B 40 100
感谢任何想法!
你可以试试
library(dplyr)
datas %>%
group_by(date2, date1) %>%
summarise(eff=total[1L]-sum(eff), code='KO', total=total[1L]) %>%
bind_rows(., datas) %>%
arrange(date2,code)
或者类似的data.table
方法是
library(data.table)
rbind(datas,setDT(datas)[, list(eff=total[1L]-sum(eff),
code='KO',total=total[1L]),.(date2,date1)])[order(date2)]
我观察不同日期(date2)人群中不同字符(code)的频率(eff)。
datas <- data.frame(date2 = rep(seq(Sys.Date() - 2, Sys.Date(), by = "day"), each = 2),
date1 = Sys.Date(),
code = rep(LETTERS[1:2], 3),
eff = c(50, 30, 20, 10, 20, 20),
total = 100)
> datas
date2 date1 code eff total
1 2015-07-25 2015-07-27 A 50 100
2 2015-07-25 2015-07-27 B 30 100
3 2015-07-26 2015-07-27 A 20 100
4 2015-07-26 2015-07-27 B 10 100
5 2015-07-27 2015-07-27 A 20 100
6 2015-07-27 2015-07-27 B 20 100
对于每个 date2,我想添加另一个代码,其中 "fill" 当天的频率总和与当天观察到的总人口(总计)之间的差异。
例如,这是我期望的输出:
date2 eff date1 total code
1 2015-07-25 50 2015-07-27 100 A
2 2015-07-25 30 2015-07-27 100 B
3 2015-07-25 20 2015-07-27 100 KO
4 2015-07-26 20 2015-07-27 100 A
5 2015-07-26 10 2015-07-27 100 B
6 2015-07-26 70 2015-07-27 100 KO
7 2015-07-27 20 2015-07-27 100 A
8 2015-07-27 20 2015-07-27 100 B
9 2015-07-27 60 2015-07-27 100 KO
这就是我生成输出的方式:
datas %>%
group_by(date2) %>%
summarise(eff = (sum(total) / n()) - sum(eff)) %>%
inner_join(datas, by = "date2") %>%
select(-c(eff.y, code), eff = eff.x) %>%
distinct %>%
mutate(code = "KO") %>%
bind_rows(datas)
但我不喜欢这个解决方案,我想知道是否有人有更好的解决方案!
另外,您将如何处理 2 个分组变量(下面示例中的日期 1 和日期 2)?
datas2 <- data.frame(date2 = c(rep(seq(Sys.Date() - 2, Sys.Date(), by = "day"), each = 2),
rep(seq(Sys.Date() - 1, Sys.Date(), by = "day"), each = 2)),
date1 = c(rep(Sys.Date() - 3, 6), rep(Sys.Date() - 2, 4)),
code = c(rep(LETTERS[1:2], 3), rep(LETTERS[1:2], 2)),
eff = c(50, 30, 20, 10, 20, 20, 10, 20, 30, 40),
total = 100)
> datas2
date2 date1 code eff total
1 2015-07-25 2015-07-24 A 50 100
2 2015-07-25 2015-07-24 B 30 100
3 2015-07-26 2015-07-24 A 20 100
4 2015-07-26 2015-07-24 B 10 100
5 2015-07-27 2015-07-24 A 20 100
6 2015-07-27 2015-07-24 B 20 100
7 2015-07-26 2015-07-25 A 10 100
8 2015-07-26 2015-07-25 B 20 100
9 2015-07-27 2015-07-25 A 30 100
10 2015-07-27 2015-07-25 B 40 100
感谢任何想法!
你可以试试
library(dplyr)
datas %>%
group_by(date2, date1) %>%
summarise(eff=total[1L]-sum(eff), code='KO', total=total[1L]) %>%
bind_rows(., datas) %>%
arrange(date2,code)
或者类似的data.table
方法是
library(data.table)
rbind(datas,setDT(datas)[, list(eff=total[1L]-sum(eff),
code='KO',total=total[1L]),.(date2,date1)])[order(date2)]