识别向量中的给定模式并添加缺少的元素以获得给定模式的重复

Question

这个问题与Wide a dataframe and insert missing columns

相关

假设我们有一个 given 模式，其中包含 5 个元素，顺序为："A", "B", "C", "D", "E"

这个模式重复了 10 次。但有时会缺少一些元素（见图片我的矢量（橙色）。

是否可以在R中识别重复的模式并填充缺少的元素（见图片我想要的输出）。

我的矢量：

my.vector <- c("A", "B", "C", "D", "E", "A", "B", "C", "D", "E", "B", "C", 
               "D", "E", "B", "C", "D", "E", "B", "C", "D", "E", "B", "C", "D", 
               "E", "B", "C", "D", "E", "B", "C", "D", "E", "A", "B", "C", "D", 
               "E", "B")

my.vector
 [1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "B" "C" "D" "E" "A" "B" "C" "D" "E" "B"

图解说明：

给定的模式：

我的矢量：

我想要的输出：要添加的红色标记元素

Answer 1

根据 matching 索引的 diff 使用 LETTERS[1:5]、split 创建分组列（或使用任何分组函数，如 tapply 等), 并用 'LETTERS[1:5], unlistthelistandunname`

创建一个 union

unname( unlist(lapply(split(my.vector, cumsum(c(TRUE, 
     diff(match(my.vector, LETTERS[1:5])) != 1))),
       function(x) union(LETTERS[1:5], x))))

-输出

[1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A"
[37] "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E"

或者另一种选择是 complete

library(dplyr)
library(tidyr)
library(data.table)
tibble(col1 = my.vector) %>%
    group_by(rn = rowid(col1)) %>%
    complete(col1 = LETTERS[1:5]) %>%
    ungroup %>%
    pull(col1)

-输出

1] "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E" "A"
[37] "B" "C" "D" "E" "A" "B" "C" "D" "E" "A" "B" "C" "D" "E"

识别向量中的给定模式并添加缺少的元素以获得给定模式的重复

Recognize a given pattern in a vector and add the lacking elements to get the repitition of the given pattern

r

sequence

pattern-matching