使用 data.table 创建字符变量

Creating Character Variables with data.table

假设我们有以下 data.table:

x_dt <- data.table(sexn = c(1, 0, 0, 1, NA, 1, NA), 
                   country = c("CHN", "JPN", "BGR",  "AUT", " ", "TWN", " "), 
                   age = c(35, NA, 40, NA, 70, 18, 36)
)

我正在尝试创建一个变量 asia_region,当国家 %chin% c("CHN", "JPN", "KOR", "SGP", "TWN") 时值为 1,当国家不缺失时值为 0,当国家缺失时值为 NA失踪。

以下代码在国家/地区缺失时填充 0。

result <- x_dt[, asia_region := ifelse(country %chin% c("CHN", "JPN", "KOR",  "SGP", "TWN"),1 , 0)]

我们可以直接用as.integer+将逻辑强制转换为二进制,然后将值更改为NA,其中'country'为空("") 通过在 i 中指定逻辑条件并为 'asia_region' 到 NA

中的相应元素指定赋值 (:=)
x_dt[,  asia_region := +(country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"))]
x_dt[trimws(country) == "", asia_region := NA_integer_]

-输出

> x_dt
   sexn country age asia_region
1:    1     CHN  35           1
2:    0     JPN  NA           1
3:    0     BGR  40           0
4:    1     AUT  NA           0
5:   NA          70          NA
6:    1     TWN  18           1
7:   NA          36          NA

或者如果我们需要一个 ifelse/fifelseif/else 不会工作,因为它没有向量化,即它期望输入表达式的长度为 1 且不超过该长度)

x_dt[, asia_region := fifelse(trimws(country) == "", NA_integer_,
        fifelse(country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"), 1, 0))]

dplyr() 解决方案怎么样?我会制作一个国家的矢量,以便于参考:

asia_countries <-  c("CHN", "JPN", "KOR",  "SGP", "TWN")

x_dt |>
  dplyr::mutate(asia_region = ifelse(country %in% asia_countries, 1, 0)) |>
  dplyr::mutate(asia_region = ifelse(country == " ", NA, asia_region))