对日期匹配的列求和
Summing across columns with match on dates
我有一个数据框 df1,其中包含列 ID 和日期(作为字符串 "XYYYYMMDD")
ID <- c(101,101,101,102,102)
date <- c("X20170101","X20170103","X20170106","X20170102","X20170104")
df1 <- data.table(ID,date)
ID date
101 X20170101
101 X20170103
101 X20170106
102 X20170102
102 X20170104
和一个数据框 df2,列 ID 和许多日期为 headers
ID <- c(100,101,102,103,104)
X20170101 <- c(1,NA,NA,2,1)
X20170102 <- c(NA,1,1,1,NA)
X20170103<-c(NA,1,NA,2,1)
X20170104 <- c(2,3,NA,2,1)
X20170105 <- c(1,1,NA,1,NA)
X20170106<-c(NA,1,NA,2,1)
df2=data.table(ID,X20170101,X20170102,X20170103,X20170104,X20170105,X20170106)
ID X20170101 X20170102 X20170103 X20170104 X20170105 X20170106
100 1 NA NA 2 1 NA
101 NA 1 1 3 1 1
102 NA 1 NA NA NA NA
103 2 1 2 2 1 2
104 1 NA 1 1 NA 1
我想向 df1 添加一列 count,其中包含以下内容:对于每个 ID(例如 101)和日期(例如X20170101) in df1,df2 中相应单元格在该日期(例如 X20170101)与 中连续单元格之间的总和df1 减去一天(例如 X20170102,而不是 X20170103)。也就是说,新的 df1 应该是这样的:
ID date count
101 X20170101 1
101 X20170103 5
101 X20170106 1
102 X20170102 1
102 X20170104 NA
感谢您的帮助。
使用 dplyr
包:
library(dplyr)
count <- aggregate(df1$date), by = list(df1$date), FUN = length)
df1 <- merge(df1, count, by = "date", all.x = TRUE)
如果可行,请告诉我!
给你!
library(data.table)
library(dplyr)
library(tidyr)
df2 %>% gather(date,val,-ID) %>%
full_join(df1 %>% mutate(tag=1)) %>%
arrange(ID) %>%
replace_na(list(val=0,tag=0)) %>%
group_by(ID) %>%
mutate(grp=cumsum(tag)) %>%
group_by(ID,grp) %>%
summarize(count = sum(val),date=head(date,1)) %>%
ungroup %>%
mutate(count=ifelse(count== 0,NA,count)) %>%
select(ID,date,count) %>%
right_join(df1)
# # A tibble: 5 x 3
# ID date count
# <dbl> <chr> <dbl>
# 1 101 X20170101 1
# 2 101 X20170103 5
# 3 101 X20170106 1
# 4 102 X20170102 1
# 5 102 X20170104 NA
我有一个数据框 df1,其中包含列 ID 和日期(作为字符串 "XYYYYMMDD")
ID <- c(101,101,101,102,102)
date <- c("X20170101","X20170103","X20170106","X20170102","X20170104")
df1 <- data.table(ID,date)
ID date
101 X20170101
101 X20170103
101 X20170106
102 X20170102
102 X20170104
和一个数据框 df2,列 ID 和许多日期为 headers
ID <- c(100,101,102,103,104)
X20170101 <- c(1,NA,NA,2,1)
X20170102 <- c(NA,1,1,1,NA)
X20170103<-c(NA,1,NA,2,1)
X20170104 <- c(2,3,NA,2,1)
X20170105 <- c(1,1,NA,1,NA)
X20170106<-c(NA,1,NA,2,1)
df2=data.table(ID,X20170101,X20170102,X20170103,X20170104,X20170105,X20170106)
ID X20170101 X20170102 X20170103 X20170104 X20170105 X20170106
100 1 NA NA 2 1 NA
101 NA 1 1 3 1 1
102 NA 1 NA NA NA NA
103 2 1 2 2 1 2
104 1 NA 1 1 NA 1
我想向 df1 添加一列 count,其中包含以下内容:对于每个 ID(例如 101)和日期(例如X20170101) in df1,df2 中相应单元格在该日期(例如 X20170101)与 中连续单元格之间的总和df1 减去一天(例如 X20170102,而不是 X20170103)。也就是说,新的 df1 应该是这样的:
ID date count
101 X20170101 1
101 X20170103 5
101 X20170106 1
102 X20170102 1
102 X20170104 NA
感谢您的帮助。
使用 dplyr
包:
library(dplyr)
count <- aggregate(df1$date), by = list(df1$date), FUN = length)
df1 <- merge(df1, count, by = "date", all.x = TRUE)
如果可行,请告诉我!
给你!
library(data.table)
library(dplyr)
library(tidyr)
df2 %>% gather(date,val,-ID) %>%
full_join(df1 %>% mutate(tag=1)) %>%
arrange(ID) %>%
replace_na(list(val=0,tag=0)) %>%
group_by(ID) %>%
mutate(grp=cumsum(tag)) %>%
group_by(ID,grp) %>%
summarize(count = sum(val),date=head(date,1)) %>%
ungroup %>%
mutate(count=ifelse(count== 0,NA,count)) %>%
select(ID,date,count) %>%
right_join(df1)
# # A tibble: 5 x 3
# ID date count
# <dbl> <chr> <dbl>
# 1 101 X20170101 1
# 2 101 X20170103 5
# 3 101 X20170106 1
# 4 102 X20170102 1
# 5 102 X20170104 NA