根据特定的值序列创建子集

Question

我有这样一个数据框：

df <- data.frame(x = c(0:20), y = c(50:70),
                 m = c(0, 0, 0, 0, -1, 0, 0, 1, 0, 0, -1, 0 ,0 , -1, 0, 0, 1, 0, 0, -1, 0))

我想创建由 'm' 列中的一系列值定义的子集：

一个序列应该开始和结束与m == -1，并且必须有一个1开始和结束 -1。然后每个子集包括 start 和 end.

之间的所有行

例如，上述数据的子集之一如下所示：

Subset1 <- data.frame(x = c(4:10), y = c(54:60), m = c(-1, 0, 0, 1, 0, 0, -1))
#    x  y  m
# 1  4 54 -1 # starts with -1
# 2  5 55  0
# 3  6 56  0
# 4  7 57  1 # contains a 1
# 5  8 58  0
# 6  9 59  0
# 7 10 60 -1 # ends with -1

我已经尝试了很多，但我不知道该怎么做。我尝试过 mapply 或 for 循环，但在设置模式时我总是卡住，因为模式的两端是相同的。

例如，mapply，我已经完成了：

List_subsets <- mapply(function(i, j, z) df[i:j:z, , drop = FALSE], -1, 1, -1,
                       SIMPLIFY = FALSE)

当然，我总是得到

# error: In i:j:z : numerical expression has 3 elements: only the first used

你知道这是否可行吗？你能帮我吗？非常感谢您的意见，因为我是 R 的新手，这对我来说非常具有挑战性。

非常感谢！

Answer 1

你可以试试这个，如果是你想要的结果请告诉我：

library(stringr)
pattrn <- data.frame(str_locate_all(paste0(df$m+1,collapse=''),'0[1]*?2[1]*?0')[[1]])
## str_locate_all will find all start and end of the pattern -1,1,-1
## to find -1, 1, -1 , I have added 1 to the column, this will remove the negative sign for correct capture of location
## so, the new pattern to be found is 0,2,0, to do this I concatenated the m column and try to find the 0, 2, 0 with regex mentioned
pattrn_rows <- Map(seq, from=pattrn$start, to=pattrn$end)
## converting to data.frame
lapply(pattrn_rows,function(x)df[x,])
## finally subsetting, this step will give the final result into two lists of dataframes

输出：

[[1]]
    x  y  m
5   4 54 -1
6   5 55  0
7   6 56  0
8   7 57  1
9   8 58  0
10  9 59  0
11 10 60 -1

[[2]]
    x  y  m
14 13 63 -1
15 14 64  0
16 15 65  0
17 16 66  1
18 17 67  0
19 18 68  0
20 19 69 -1

根据特定的值序列创建子集

Create subsets based on a certain sequence of values

r

subset

sequence

dataframe