使用 grepl 和 str_sub in data.table 删除字符串的一部分

Removing part of a string with grepl and str_sub in data.table

我尝试删除一列字符串的最后四个字符 2000,如下所示:

library(data.table)
library(stringr)
DT <- structure(list(variable = structure(c(1L, 1L, 1L), .Label = c("Percent of adults with less than a high school diploma, 2000", 
"Percent of adults with a high school diploma only, 2000", "Percent of adults completing some college or associate's degree, 2000", 
"Percent of adults with a bachelor's degree or higher, 2000", 
"Percent of adults with less than a high school diploma, 2014-18", 
"Percent of adults with a high school diploma only, 2014-18", 
"Percent of adults completing some college or associate's degree, 2014-18", 
"Percent of adults with a bachelor's degree or higher, 2014-18"
), class = "factor")), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))

                                                       variable
1: Percent of adults with less than a high school diploma, 2000
2: Percent of adults with less than a high school diploma, 2000
3: Percent of adults with less than a high school diploma, 2000



# If string contains 2000, remove last four characters.
DT <- setDT(DT)[grepl("2000", variable, fixed = TRUE), str_sub(variable, end=-4)]

但是,显然语法不正确。在这种情况下语法应该是什么?

如果要在 variable 列末尾去除年份和可能的范围,请使用 sub:

DT$variable <- sub(", \d{4}(?:-\d{2})?$", "", DT$variable)

Demo

您需要将更改后的值分配给 variable

library(data.table)
library(stringr)

DT[grepl("2000", variable, fixed = TRUE), variable := str_sub(variable, end=-4)]

你确定 end 应该是 -4 吗?