使用 grepl 和 str_sub in data.table 删除字符串的一部分
Removing part of a string with grepl and str_sub in data.table
我尝试删除一列字符串的最后四个字符 2000
,如下所示:
library(data.table)
library(stringr)
DT <- structure(list(variable = structure(c(1L, 1L, 1L), .Label = c("Percent of adults with less than a high school diploma, 2000",
"Percent of adults with a high school diploma only, 2000", "Percent of adults completing some college or associate's degree, 2000",
"Percent of adults with a bachelor's degree or higher, 2000",
"Percent of adults with less than a high school diploma, 2014-18",
"Percent of adults with a high school diploma only, 2014-18",
"Percent of adults completing some college or associate's degree, 2014-18",
"Percent of adults with a bachelor's degree or higher, 2014-18"
), class = "factor")), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
variable
1: Percent of adults with less than a high school diploma, 2000
2: Percent of adults with less than a high school diploma, 2000
3: Percent of adults with less than a high school diploma, 2000
# If string contains 2000, remove last four characters.
DT <- setDT(DT)[grepl("2000", variable, fixed = TRUE), str_sub(variable, end=-4)]
但是,显然语法不正确。在这种情况下语法应该是什么?
如果要在 variable
列末尾去除年份和可能的范围,请使用 sub
:
DT$variable <- sub(", \d{4}(?:-\d{2})?$", "", DT$variable)
您需要将更改后的值分配给 variable
。
library(data.table)
library(stringr)
DT[grepl("2000", variable, fixed = TRUE), variable := str_sub(variable, end=-4)]
你确定 end
应该是 -4 吗?
我尝试删除一列字符串的最后四个字符 2000
,如下所示:
library(data.table)
library(stringr)
DT <- structure(list(variable = structure(c(1L, 1L, 1L), .Label = c("Percent of adults with less than a high school diploma, 2000",
"Percent of adults with a high school diploma only, 2000", "Percent of adults completing some college or associate's degree, 2000",
"Percent of adults with a bachelor's degree or higher, 2000",
"Percent of adults with less than a high school diploma, 2014-18",
"Percent of adults with a high school diploma only, 2014-18",
"Percent of adults completing some college or associate's degree, 2014-18",
"Percent of adults with a bachelor's degree or higher, 2014-18"
), class = "factor")), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
variable
1: Percent of adults with less than a high school diploma, 2000
2: Percent of adults with less than a high school diploma, 2000
3: Percent of adults with less than a high school diploma, 2000
# If string contains 2000, remove last four characters.
DT <- setDT(DT)[grepl("2000", variable, fixed = TRUE), str_sub(variable, end=-4)]
但是,显然语法不正确。在这种情况下语法应该是什么?
如果要在 variable
列末尾去除年份和可能的范围,请使用 sub
:
DT$variable <- sub(", \d{4}(?:-\d{2})?$", "", DT$variable)
您需要将更改后的值分配给 variable
。
library(data.table)
library(stringr)
DT[grepl("2000", variable, fixed = TRUE), variable := str_sub(variable, end=-4)]
你确定 end
应该是 -4 吗?