R:如果例如,从矩阵中过滤行与更右列中的另一个特定值相比,左列中是否存在某个特定值?

R: Filter rows from a matrix if e.g. a certain value is present in any more left column than another certain value in a more right column?

我有一个矩阵,想要执行以下操作:

  1. 多次删除所有包含“Z”的行
  2. 删除在直接相邻列中至少出现两次“S”的所有行
  3. 删除所有“2D”仅存在一次且“1D”不存在或仅一次出现在更左列(较低列号)中的所有行

这是一个带有解释的 MWE:

x <- matrix(c(
            # Point 1:
            "Z", "1D", "Z", "S",  # Delete row because Z is present more than once.
            # Point 2:
            "S", "S", "Z", "1D", # Delete row because S is present at least twice and in columns following each other directly.
            "S", "Z", "S", "1D", # Ok because "S" is present multiple times but there is at least one column between the occurrences.
            # Point 3:
            "1D", "Z", "2D", "1D", # 1D is followed by a later "2D" which is correct, but another "1D" follows after "2D", so delete this row.
            "Z", "S", "2D", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
            "2D", "1D", "Z", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
            "1D", "Z", "S", "2D", # Valid row
            "1D", "2D", "S", "Z"), # Valid row 
             nrow = 8, byrow = TRUE)

# Possible solution for removing columns with multiple occurences of "Z"
require(matrixStats)
x <- x[!rowCounts(x, value = "Z")>1, ]

第二点和第三点怎么做?

您可以试试这个自定义函数:

apply_rules <- function(y) {
  rule1 <- sum(grepl('Z', y)) > 1
  rule2 <- any(with(rle(grepl('S', y)), values & lengths > 1))
  d1 <- which(y == '1D')
  d2 <- which(y == '2D')
  rule3 <- length(d1) < 1 || any(d1 > d2)
  rule1 || rule2 || rule3
}

apply(x, 1, apply_rules)
#[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE

x[!apply(x, 1, apply_rules), ]

#    [,1] [,2] [,3] [,4]
#[1,] "S"  "Z"  "S"  "1D"
#[2,] "1D" "Z"  "S"  "2D"
#[3,] "1D" "2D" "S"  "Z" 

已更新 我没有注意到你的评论,我们可以有一个连续的 1D 所以我做了一些修改,输出正是你所期望的:

library(dplyr)

x %>%
  as_tibble(names_repair = 'unique') %>%
  rowwise() %>%
  mutate(Sum_Z = sum(c_across(everything()) == "Z"), 
         col = paste0(V1, V2, V3, V4), 
         SS_exist = grepl("S{2,}", col),
         both_1D_2D = grepl("1D", col) & grepl("2D", col),
         `1D after 2D` = grepl("2D1D", col),
         `1D` = grepl("1D", col)) %>%
  filter(Sum_Z <= 1, !SS_exist, `1D`, !`1D after 2D`, both_1D_2D || `1D`) %>%
  select(V1:V4) %>%
  as.matrix(dimnames = NULL)


     V1   V2   V3  V4  
[1,] "S"  "Z"  "S" "1D"
[2,] "1D" "Z"  "S" "2D"
[3,] "1D" "2D" "S" "Z"