根据其他列中值的相对大小创建新列,data.table

Create new column based on relative size of values in other columns, data.table

我有一个 table,其中两列值介于 0 和 1 之间。例如:

set.seed(123)
table <- data.table(value1 = runif(10),
                    value2 = runif(10))
table

   value1     value2
 0.2875775 0.95683335
 0.7883051 0.45333416
 0.4089769 0.67757064
 0.8830174 0.57263340
 0.9404673 0.10292468
 0.0455565 0.89982497
 0.5281055 0.24608773
 0.8924190 0.04205953
 0.5514350 0.32792072
 0.4566147 0.95450365

我想使用 data.table 来创建一个新的二进制列,将 1 分配给 value2 和 value1 之间差异最大的 x 行。我可以像这样创建一个“差异”列:

table[,difference:=value1-value2]

我可以使用 ordertail 找到 x 最大的差异,例如如果 x 是 5:

x<-5
table[order(difference), tail(.SD, x)]

但我还没有想出一种方法将这些与 ifelsecase_when 之类的东西结合起来,将 1 分配给 x 最大的差异,将 0 分配给休息。

希望这能解决您的问题:

library(data.table)
library(dtplyr)
library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:data.table':
#> 
#>     between, first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

set.seed(123)
table <- data.table(value1 = runif(10),
                    value2 = runif(10))
table
#>        value1     value2
#>  1: 0.2875775 0.95683335
#>  2: 0.7883051 0.45333416
#>  3: 0.4089769 0.67757064
#>  4: 0.8830174 0.57263340
#>  5: 0.9404673 0.10292468
#>  6: 0.0455565 0.89982497
#>  7: 0.5281055 0.24608773
#>  8: 0.8924190 0.04205953
#>  9: 0.5514350 0.32792072
#> 10: 0.4566147 0.95450365

x <- 5

table <- table %>% 
  lazy_dt() %>% 
  mutate(difference = value1 - value2) %>% 
  arrange(desc(difference)) %>% 
  mutate(difference = ifelse(test = row_number() <= x, yes = 1, no = 0)) %>% 
  as.data.table()

table
#>        value1     value2 difference
#>  1: 0.8924190 0.04205953          1
#>  2: 0.9404673 0.10292468          1
#>  3: 0.7883051 0.45333416          1
#>  4: 0.8830174 0.57263340          1
#>  5: 0.5281055 0.24608773          1
#>  6: 0.5514350 0.32792072          0
#>  7: 0.4089769 0.67757064          0
#>  8: 0.4566147 0.95450365          0
#>  9: 0.2875775 0.95683335          0
#> 10: 0.0455565 0.89982497          0

您好, M.

reprex package (v2.0.1)

于 2021-10-11 创建
setorderv(table, "difference", order = -1)

table[, large := 0]
x <- 5
table[1:x, large := 1]