如何让每行周围的 k 行在 R 数据帧的每个方向上满足给定条件?
How can I get the k rows surrounding each row meeting a given condition in each direction of an R data frame?
dplyr
方案优先。
假设我有以下数据:
library(tibble)
frame_data(
~a, ~b, ~c, ~d, ~e,
1, 2, 3, 4, FALSE,
5, 6, 7,8, TRUE,
9, 10, 11, 12, TRUE,
13, 14, 15, 16, FALSE,
17, 18, 19, 20, FALSE,
21, 22, 23, 24, FALSE,
25, 26, 27, 28, TRUE,
29, 30, 31, 32, FALSE,
33, 34, 35, 36, FALSE,
37, 38, 39, 40, FALSE
)
我希望提取 e
中值为 TRUE 的行,然后还提取 e
行周围的 k
行中的 window =] 在两个方向上都为 TRUE,与 e
中的值无关。例如,如果k=1
,我想要:
frame_data(
1, 2, 3, 4, FALSE,
5, 6, 7,8, TRUE,
9, 10, 11, 12, TRUE,
13, 14, 15, 16, FALSE,
21, 22, 23, 24, FALSE,
25, 26, 27, 28, TRUE,
29, 30, 31, 32, FALSE
)
如果 k=2
,我想要:
frame_data(
~a, ~b, ~c, ~d, ~e,
1, 2, 3, 4, FALSE,
5, 6, 7,8, TRUE,
9, 10, 11, 12, TRUE,
13, 14, 15, 16, FALSE,
17, 18, 19, 20, FALSE,
21, 22, 23, 24, FALSE,
25, 26, 27, 28, TRUE,
29, 30, 31, 32, FALSE,
33, 34, 35, 36, FALSE
)
这是一个可能的解决方案:
#selection window size
k<-1
#find row numbers
foundrows<-which(dat$e)
#create row index based on found row +- window size
selectedRows<-unlist(lapply(foundrows, function(z){seq(z-k, z+k)}))
#remove overlaps and out of bounds subscripts
selectedRows<-sort(unique(selectedRows))
selectedRows<-selectedRows[selectedRows>0 & selectedRows<=nrow(dat)]
dat[selectedRows,]
不如使用 lat/lead 函数那么直接,但它确实允许轻松调整 window 大小。它使用基数 R 并将行索引限制在数据帧的范围内。
dplyr
方案优先。
假设我有以下数据:
library(tibble)
frame_data(
~a, ~b, ~c, ~d, ~e,
1, 2, 3, 4, FALSE,
5, 6, 7,8, TRUE,
9, 10, 11, 12, TRUE,
13, 14, 15, 16, FALSE,
17, 18, 19, 20, FALSE,
21, 22, 23, 24, FALSE,
25, 26, 27, 28, TRUE,
29, 30, 31, 32, FALSE,
33, 34, 35, 36, FALSE,
37, 38, 39, 40, FALSE
)
我希望提取 e
中值为 TRUE 的行,然后还提取 e
行周围的 k
行中的 window =] 在两个方向上都为 TRUE,与 e
中的值无关。例如,如果k=1
,我想要:
frame_data(
1, 2, 3, 4, FALSE,
5, 6, 7,8, TRUE,
9, 10, 11, 12, TRUE,
13, 14, 15, 16, FALSE,
21, 22, 23, 24, FALSE,
25, 26, 27, 28, TRUE,
29, 30, 31, 32, FALSE
)
如果 k=2
,我想要:
frame_data(
~a, ~b, ~c, ~d, ~e,
1, 2, 3, 4, FALSE,
5, 6, 7,8, TRUE,
9, 10, 11, 12, TRUE,
13, 14, 15, 16, FALSE,
17, 18, 19, 20, FALSE,
21, 22, 23, 24, FALSE,
25, 26, 27, 28, TRUE,
29, 30, 31, 32, FALSE,
33, 34, 35, 36, FALSE
)
这是一个可能的解决方案:
#selection window size
k<-1
#find row numbers
foundrows<-which(dat$e)
#create row index based on found row +- window size
selectedRows<-unlist(lapply(foundrows, function(z){seq(z-k, z+k)}))
#remove overlaps and out of bounds subscripts
selectedRows<-sort(unique(selectedRows))
selectedRows<-selectedRows[selectedRows>0 & selectedRows<=nrow(dat)]
dat[selectedRows,]
不如使用 lat/lead 函数那么直接,但它确实允许轻松调整 window 大小。它使用基数 R 并将行索引限制在数据帧的范围内。