R:如果例如,从矩阵中过滤行与更右列中的另一个特定值相比,左列中是否存在某个特定值?
R: Filter rows from a matrix if e.g. a certain value is present in any more left column than another certain value in a more right column?
我有一个矩阵,想要执行以下操作:
- 多次删除所有包含“Z”的行
- 删除在直接相邻列中至少出现两次“S”的所有行
- 删除所有“2D”仅存在一次且“1D”不存在或仅一次出现在更左列(较低列号)中的所有行
这是一个带有解释的 MWE:
x <- matrix(c(
# Point 1:
"Z", "1D", "Z", "S", # Delete row because Z is present more than once.
# Point 2:
"S", "S", "Z", "1D", # Delete row because S is present at least twice and in columns following each other directly.
"S", "Z", "S", "1D", # Ok because "S" is present multiple times but there is at least one column between the occurrences.
# Point 3:
"1D", "Z", "2D", "1D", # 1D is followed by a later "2D" which is correct, but another "1D" follows after "2D", so delete this row.
"Z", "S", "2D", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
"2D", "1D", "Z", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
"1D", "Z", "S", "2D", # Valid row
"1D", "2D", "S", "Z"), # Valid row
nrow = 8, byrow = TRUE)
# Possible solution for removing columns with multiple occurences of "Z"
require(matrixStats)
x <- x[!rowCounts(x, value = "Z")>1, ]
第二点和第三点怎么做?
您可以试试这个自定义函数:
apply_rules <- function(y) {
rule1 <- sum(grepl('Z', y)) > 1
rule2 <- any(with(rle(grepl('S', y)), values & lengths > 1))
d1 <- which(y == '1D')
d2 <- which(y == '2D')
rule3 <- length(d1) < 1 || any(d1 > d2)
rule1 || rule2 || rule3
}
apply(x, 1, apply_rules)
#[1] TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
x[!apply(x, 1, apply_rules), ]
# [,1] [,2] [,3] [,4]
#[1,] "S" "Z" "S" "1D"
#[2,] "1D" "Z" "S" "2D"
#[3,] "1D" "2D" "S" "Z"
已更新
我没有注意到你的评论,我们可以有一个连续的 1D
所以我做了一些修改,输出正是你所期望的:
library(dplyr)
x %>%
as_tibble(names_repair = 'unique') %>%
rowwise() %>%
mutate(Sum_Z = sum(c_across(everything()) == "Z"),
col = paste0(V1, V2, V3, V4),
SS_exist = grepl("S{2,}", col),
both_1D_2D = grepl("1D", col) & grepl("2D", col),
`1D after 2D` = grepl("2D1D", col),
`1D` = grepl("1D", col)) %>%
filter(Sum_Z <= 1, !SS_exist, `1D`, !`1D after 2D`, both_1D_2D || `1D`) %>%
select(V1:V4) %>%
as.matrix(dimnames = NULL)
V1 V2 V3 V4
[1,] "S" "Z" "S" "1D"
[2,] "1D" "Z" "S" "2D"
[3,] "1D" "2D" "S" "Z"
我有一个矩阵,想要执行以下操作:
- 多次删除所有包含“Z”的行
- 删除在直接相邻列中至少出现两次“S”的所有行
- 删除所有“2D”仅存在一次且“1D”不存在或仅一次出现在更左列(较低列号)中的所有行
这是一个带有解释的 MWE:
x <- matrix(c(
# Point 1:
"Z", "1D", "Z", "S", # Delete row because Z is present more than once.
# Point 2:
"S", "S", "Z", "1D", # Delete row because S is present at least twice and in columns following each other directly.
"S", "Z", "S", "1D", # Ok because "S" is present multiple times but there is at least one column between the occurrences.
# Point 3:
"1D", "Z", "2D", "1D", # 1D is followed by a later "2D" which is correct, but another "1D" follows after "2D", so delete this row.
"Z", "S", "2D", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
"2D", "1D", "Z", "S", # "2D" is present without a "1D" in a more left column, so delete this row.
"1D", "Z", "S", "2D", # Valid row
"1D", "2D", "S", "Z"), # Valid row
nrow = 8, byrow = TRUE)
# Possible solution for removing columns with multiple occurences of "Z"
require(matrixStats)
x <- x[!rowCounts(x, value = "Z")>1, ]
第二点和第三点怎么做?
您可以试试这个自定义函数:
apply_rules <- function(y) {
rule1 <- sum(grepl('Z', y)) > 1
rule2 <- any(with(rle(grepl('S', y)), values & lengths > 1))
d1 <- which(y == '1D')
d2 <- which(y == '2D')
rule3 <- length(d1) < 1 || any(d1 > d2)
rule1 || rule2 || rule3
}
apply(x, 1, apply_rules)
#[1] TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE
x[!apply(x, 1, apply_rules), ]
# [,1] [,2] [,3] [,4]
#[1,] "S" "Z" "S" "1D"
#[2,] "1D" "Z" "S" "2D"
#[3,] "1D" "2D" "S" "Z"
已更新
我没有注意到你的评论,我们可以有一个连续的 1D
所以我做了一些修改,输出正是你所期望的:
library(dplyr)
x %>%
as_tibble(names_repair = 'unique') %>%
rowwise() %>%
mutate(Sum_Z = sum(c_across(everything()) == "Z"),
col = paste0(V1, V2, V3, V4),
SS_exist = grepl("S{2,}", col),
both_1D_2D = grepl("1D", col) & grepl("2D", col),
`1D after 2D` = grepl("2D1D", col),
`1D` = grepl("1D", col)) %>%
filter(Sum_Z <= 1, !SS_exist, `1D`, !`1D after 2D`, both_1D_2D || `1D`) %>%
select(V1:V4) %>%
as.matrix(dimnames = NULL)
V1 V2 V3 V4
[1,] "S" "Z" "S" "1D"
[2,] "1D" "Z" "S" "2D"
[3,] "1D" "2D" "S" "Z"