Removing/keeping 特定列基于它们在 R 中的内容
Removing/keeping specific columns based off their contents in R
df:实际上有成千上万的变体和 ID
variant1 variant2 variant3 variant4
ID1 0/0 0/0 0/0 0
ID2 0/0 0/0 0/0 0
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
期望的结果:
variant1 variant2 variant3 variant4
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
我只想保留包含 0/1、1/1 或 1 的行。
我已经尝试 dt[grepl(0/1", df),]
每次迭代,但它似乎不起作用。
有基本的 R 或 data.table 方法吗?
我们可以使用 if_any
从 dplyr
library(dplyr)
dt %>%
filter(if_any(everything(), ~ . %in% c("0/1", "1/1", "1")))
-输出
variant1 variant2 variant3 variant4
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
或使用base R
dt[ Reduce(`|`, lapply(dt, `%in%`, c("0/1", "1/1", "1"))),]
-输出
variant1 variant2 variant3 variant4
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
可以在data.table
中使用相同的选项
library(data.table)
setDT(dt)[dt[, Reduce(`|`, lapply(.SD, `%in%`, c("0/1", "1/1", "1")))]]
数据
dt <- structure(list(variant1 = c("0/0", "0/0", "0/0", "0/0", "0/1"
), variant2 = c("0/0", "0/0", "0/0", "0/0", "0/0"), variant3 = c("0/0",
"0/0", "1/1", "0/0", "0/0"), variant4 = c("0", "0", "0", "1",
"0")), row.names = c("ID1", "ID2", "ID3", "ID4", "ID5"), class = "data.frame")
df:实际上有成千上万的变体和 ID
variant1 variant2 variant3 variant4
ID1 0/0 0/0 0/0 0
ID2 0/0 0/0 0/0 0
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
期望的结果:
variant1 variant2 variant3 variant4
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
我只想保留包含 0/1、1/1 或 1 的行。
我已经尝试 dt[grepl(0/1", df),]
每次迭代,但它似乎不起作用。
有基本的 R 或 data.table 方法吗?
我们可以使用 if_any
从 dplyr
library(dplyr)
dt %>%
filter(if_any(everything(), ~ . %in% c("0/1", "1/1", "1")))
-输出
variant1 variant2 variant3 variant4
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
或使用base R
dt[ Reduce(`|`, lapply(dt, `%in%`, c("0/1", "1/1", "1"))),]
-输出
variant1 variant2 variant3 variant4
ID3 0/0 0/0 1/1 0
ID4 0/0 0/0 0/0 1
ID5 0/1 0/0 0/0 0
可以在data.table
library(data.table)
setDT(dt)[dt[, Reduce(`|`, lapply(.SD, `%in%`, c("0/1", "1/1", "1")))]]
数据
dt <- structure(list(variant1 = c("0/0", "0/0", "0/0", "0/0", "0/1"
), variant2 = c("0/0", "0/0", "0/0", "0/0", "0/0"), variant3 = c("0/0",
"0/0", "1/1", "0/0", "0/0"), variant4 = c("0", "0", "0", "1",
"0")), row.names = c("ID1", "ID2", "ID3", "ID4", "ID5"), class = "data.frame")