在 data.table 中具有特定值的 Select 列

Question

最小示例：

dt <- data.table(a=c(1,2,3),b=c(4,5,6))

看起来像这样：

>  dt
   a b
1: 1 4
2: 2 5
3: 3 6

假设我想索引有 6 值的列，在这个玩具示例中很容易，因为我们知道列：

> dt[,.(b)]
   b
1: 4
2: 5
3: 6

现在，如果这个 dt 有几千列，我们不知道 6 在哪里。

我试过这个：

> dt[,.SD==6]
         a     b
[1,] FALSE FALSE
[2,] FALSE FALSE
[3,] FALSE  TRUE

还有这个：

> dt[,lapply(.SD,`==`,6)]
       a     b
1: FALSE FALSE
2: FALSE FALSE
3: FALSE  TRUE

还有：

> dt[,lapply(.SD,function(x) any(x==6))]
       a    b
1: FALSE TRUE

但是我无法找回原来的专栏:

   b
1: 4
2: 5
3: 6

Answer 1

希望有更优雅的解决方案，但与此同时：

dt[,sapply(dt, function(x) any(x == 6)), with=F]

   b
1: 4
2: 5
3: 6

这是一个快速基准测试，因为 data.table 通常用于速度：

n=1000000
dt = data.table(V1 = round(runif(n) * 100), V2 = round(runif(n) * 100) ,V3 = round(runif(n) * 100), V4 = round(runif(n) * 100), V5 = round(runif(n) * 100), V6 = round(runif(n) * 100))

bench = microbenchmark::microbenchmark(
    user438383 = dt[,sapply(dt, function(x) any(x == 6)), with=F],
    Wimpel = dt[, colSums(dt == 6) > 0, with = FALSE],
    times = 10000
    )

Answer 2

dt[, colSums(dt == 6) > 0, with = FALSE]
#    b
# 1: 4
# 2: 5
# 3: 6

在 data.table 中具有特定值的 Select 列

Select columns which has a specific value in data.table

r

data.table