根据其他列中值的相对大小创建新列,data.table
Create new column based on relative size of values in other columns, data.table
我有一个 table,其中两列值介于 0 和 1 之间。例如:
set.seed(123)
table <- data.table(value1 = runif(10),
value2 = runif(10))
table
value1 value2
0.2875775 0.95683335
0.7883051 0.45333416
0.4089769 0.67757064
0.8830174 0.57263340
0.9404673 0.10292468
0.0455565 0.89982497
0.5281055 0.24608773
0.8924190 0.04205953
0.5514350 0.32792072
0.4566147 0.95450365
我想使用 data.table 来创建一个新的二进制列,将 1 分配给 value2 和 value1 之间差异最大的 x
行。我可以像这样创建一个“差异”列:
table[,difference:=value1-value2]
我可以使用 order
和 tail
找到 x
最大的差异,例如如果 x
是 5:
x<-5
table[order(difference), tail(.SD, x)]
但我还没有想出一种方法将这些与 ifelse
或 case_when
之类的东西结合起来,将 1 分配给 x
最大的差异,将 0 分配给休息。
希望这能解决您的问题:
library(data.table)
library(dtplyr)
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(123)
table <- data.table(value1 = runif(10),
value2 = runif(10))
table
#> value1 value2
#> 1: 0.2875775 0.95683335
#> 2: 0.7883051 0.45333416
#> 3: 0.4089769 0.67757064
#> 4: 0.8830174 0.57263340
#> 5: 0.9404673 0.10292468
#> 6: 0.0455565 0.89982497
#> 7: 0.5281055 0.24608773
#> 8: 0.8924190 0.04205953
#> 9: 0.5514350 0.32792072
#> 10: 0.4566147 0.95450365
x <- 5
table <- table %>%
lazy_dt() %>%
mutate(difference = value1 - value2) %>%
arrange(desc(difference)) %>%
mutate(difference = ifelse(test = row_number() <= x, yes = 1, no = 0)) %>%
as.data.table()
table
#> value1 value2 difference
#> 1: 0.8924190 0.04205953 1
#> 2: 0.9404673 0.10292468 1
#> 3: 0.7883051 0.45333416 1
#> 4: 0.8830174 0.57263340 1
#> 5: 0.5281055 0.24608773 1
#> 6: 0.5514350 0.32792072 0
#> 7: 0.4089769 0.67757064 0
#> 8: 0.4566147 0.95450365 0
#> 9: 0.2875775 0.95683335 0
#> 10: 0.0455565 0.89982497 0
您好,
M.
由 reprex package (v2.0.1)
于 2021-10-11 创建
setorderv(table, "difference", order = -1)
table[, large := 0]
x <- 5
table[1:x, large := 1]
我有一个 table,其中两列值介于 0 和 1 之间。例如:
set.seed(123)
table <- data.table(value1 = runif(10),
value2 = runif(10))
table
value1 value2
0.2875775 0.95683335
0.7883051 0.45333416
0.4089769 0.67757064
0.8830174 0.57263340
0.9404673 0.10292468
0.0455565 0.89982497
0.5281055 0.24608773
0.8924190 0.04205953
0.5514350 0.32792072
0.4566147 0.95450365
我想使用 data.table 来创建一个新的二进制列,将 1 分配给 value2 和 value1 之间差异最大的 x
行。我可以像这样创建一个“差异”列:
table[,difference:=value1-value2]
我可以使用 order
和 tail
找到 x
最大的差异,例如如果 x
是 5:
x<-5
table[order(difference), tail(.SD, x)]
但我还没有想出一种方法将这些与 ifelse
或 case_when
之类的东西结合起来,将 1 分配给 x
最大的差异,将 0 分配给休息。
希望这能解决您的问题:
library(data.table)
library(dtplyr)
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(123)
table <- data.table(value1 = runif(10),
value2 = runif(10))
table
#> value1 value2
#> 1: 0.2875775 0.95683335
#> 2: 0.7883051 0.45333416
#> 3: 0.4089769 0.67757064
#> 4: 0.8830174 0.57263340
#> 5: 0.9404673 0.10292468
#> 6: 0.0455565 0.89982497
#> 7: 0.5281055 0.24608773
#> 8: 0.8924190 0.04205953
#> 9: 0.5514350 0.32792072
#> 10: 0.4566147 0.95450365
x <- 5
table <- table %>%
lazy_dt() %>%
mutate(difference = value1 - value2) %>%
arrange(desc(difference)) %>%
mutate(difference = ifelse(test = row_number() <= x, yes = 1, no = 0)) %>%
as.data.table()
table
#> value1 value2 difference
#> 1: 0.8924190 0.04205953 1
#> 2: 0.9404673 0.10292468 1
#> 3: 0.7883051 0.45333416 1
#> 4: 0.8830174 0.57263340 1
#> 5: 0.5281055 0.24608773 1
#> 6: 0.5514350 0.32792072 0
#> 7: 0.4089769 0.67757064 0
#> 8: 0.4566147 0.95450365 0
#> 9: 0.2875775 0.95683335 0
#> 10: 0.0455565 0.89982497 0
您好, M.
由 reprex package (v2.0.1)
于 2021-10-11 创建setorderv(table, "difference", order = -1)
table[, large := 0]
x <- 5
table[1:x, large := 1]