使用 tidyverse 根据另一个数据框中的一系列分组值从数据框中提取分组值
Extract grouped values from a dataframe based on a range of grouped values from another dataframe using tidyverse
我正在尝试从数据帧 (df1) 中提取分组索引值,该数据帧表示一系列分组时间(开始 - 结束)并且包含另一个数据帧 (df2) 中给出的分组时间。我需要的输出是 df3.
df1<-data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.frame(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
df3<-data.frame(time=c(11,17,24,5,5,22),index=c(2,7,8,9,1,12))
我之前发布的一个相关问题已通过针对未分组数据的简洁管道解决方案得到解答:
library(tidyverse)
df1 %>%
select(from = start, to = end) %>%
pmap(seq) %>%
do.call(cbind, .) %>%
list(.) %>%
mutate(df2, new = .,
ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
select(-new)
是否可以将其修改为按 df1 和 df2 中的 'group' 列分组以提供输出 df3?
使用 group_by
,我们可以 nest
然后进行连接
library(tidyverse)
df1 %>%
group_by(group) %>%
nest(-group) %>%
mutate(new = map(data, ~.x %>%
select(from = start, to = end) %>%
pmap(seq) %>%
do.call(cbind, .) %>%
list(.))) %>%
right_join(df2) %>%
mutate(ind = map2_int(time, new, ~ which(.x == .y[[1]], arr.ind = TRUE)[,2]),
ind = map2_dbl(ind, data, ~ .y$index[.x])) %>%
select(time, ind)
# A tibble: 6 x 2
# time ind
# <dbl> <dbl>
#1 11.0 2.00
#2 17.0 7.00
#3 24.0 8.00
#4 5.00 9.00
#5 5.00 1.00
#6 22.0 12.0
这里有 data.table、
的好东西
df1<-data.table(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.table(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
df1[df2,on=.(group,start<=time,end>=time)][,c("start","index")]
start index
1: 11 2
2: 17 7
3: 24 8
4: 5 9
5: 5 1
6: 22 12
然后您可以将开始列重命名为时间,我认为您得到了答案。
我正在尝试从数据帧 (df1) 中提取分组索引值,该数据帧表示一系列分组时间(开始 - 结束)并且包含另一个数据帧 (df2) 中给出的分组时间。我需要的输出是 df3.
df1<-data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.frame(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
df3<-data.frame(time=c(11,17,24,5,5,22),index=c(2,7,8,9,1,12))
我之前发布的一个相关问题已通过针对未分组数据的简洁管道解决方案得到解答:
library(tidyverse)
df1 %>%
select(from = start, to = end) %>%
pmap(seq) %>%
do.call(cbind, .) %>%
list(.) %>%
mutate(df2, new = .,
ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
select(-new)
是否可以将其修改为按 df1 和 df2 中的 'group' 列分组以提供输出 df3?
使用 group_by
,我们可以 nest
然后进行连接
library(tidyverse)
df1 %>%
group_by(group) %>%
nest(-group) %>%
mutate(new = map(data, ~.x %>%
select(from = start, to = end) %>%
pmap(seq) %>%
do.call(cbind, .) %>%
list(.))) %>%
right_join(df2) %>%
mutate(ind = map2_int(time, new, ~ which(.x == .y[[1]], arr.ind = TRUE)[,2]),
ind = map2_dbl(ind, data, ~ .y$index[.x])) %>%
select(time, ind)
# A tibble: 6 x 2
# time ind
# <dbl> <dbl>
#1 11.0 2.00
#2 17.0 7.00
#3 24.0 8.00
#4 5.00 9.00
#5 5.00 1.00
#6 22.0 12.0
这里有 data.table、
的好东西df1<-data.table(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.table(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
df1[df2,on=.(group,start<=time,end>=time)][,c("start","index")]
start index
1: 11 2
2: 17 7
3: 24 8
4: 5 9
5: 5 1
6: 22 12
然后您可以将开始列重命名为时间,我认为您得到了答案。