根据唯一值将多个 data.table 列粘贴到单个列中
Paste multiple data.table columns into single column based on unique values
我有一个 data.table 看起来像这样:
require("data.table")
dt1 <- data.table(VAR1 = c("Brick","Sand","Concrete","Stone"), VAR2 = c(100,23,76,43), VAR3 = c("Place","Location","Place","Vista"), VAR4 = c("Place","Tree","Wood","Vista"), VAR5 = c("Place","Tree","Wood","Forest"))
我想按以下顺序将命名列(我的真实数据有附加列)粘贴在一起:VAR2、VAR1、VAR3、VAR4 和 VAR5。不过,我有两个条件:
- 同一行中的值不应重复(当值重复时,应保留最后一个条目的列 - 因此在我的示例中,VAR5 中的 'Place' 将保留)
- 除了 VAR2 和 VAR1 之间以外,粘贴时应以逗号作为分隔符
我的预期输出如下所示:
dt2 <- data.table(VAR6 = c("100 Brick, Place","23 Sand, Location, Tree","76 Concrete, Place, Wood","43 Stone, Vista, Forest"))
我们可以在.SDcols
中选择顺序中的列后使用do.call(paste
,用正则表达式去除重复的单词
dt1[, .(VAR6 = sub(",", " ", gsub("\b(\w+)\b\s*,\s*(?=.*\1)", "",
do.call(paste, c(.SD, sep=",")), perl = TRUE))),
.SDcols = names(dt1)[c(2:1, 3:5)]]
# VAR6
#1: 100 Brick,Place
#2: 23 Sand,Location,Tree
#3: 76 Concrete,Place,Wood
#4: 43 Stone,Vista,Forest
或按行顺序分组并执行 paste
V6 <- dt1[, sprintf("%s %s, %s", VAR2, VAR1,
toString(unique(unlist(.SD)))), 1:nrow(dt1), .SDcols = VAR3:VAR5]$V1
data.table(V6)
# V6
#1: 100 Brick, Place
#2: 23 Sand, Location, Tree
#3: 76 Concrete, Place, Wood
#4: 43 Stone, Vista, Forest
我有一个 data.table 看起来像这样:
require("data.table")
dt1 <- data.table(VAR1 = c("Brick","Sand","Concrete","Stone"), VAR2 = c(100,23,76,43), VAR3 = c("Place","Location","Place","Vista"), VAR4 = c("Place","Tree","Wood","Vista"), VAR5 = c("Place","Tree","Wood","Forest"))
我想按以下顺序将命名列(我的真实数据有附加列)粘贴在一起:VAR2、VAR1、VAR3、VAR4 和 VAR5。不过,我有两个条件:
- 同一行中的值不应重复(当值重复时,应保留最后一个条目的列 - 因此在我的示例中,VAR5 中的 'Place' 将保留)
- 除了 VAR2 和 VAR1 之间以外,粘贴时应以逗号作为分隔符
我的预期输出如下所示:
dt2 <- data.table(VAR6 = c("100 Brick, Place","23 Sand, Location, Tree","76 Concrete, Place, Wood","43 Stone, Vista, Forest"))
我们可以在.SDcols
中选择顺序中的列后使用do.call(paste
,用正则表达式去除重复的单词
dt1[, .(VAR6 = sub(",", " ", gsub("\b(\w+)\b\s*,\s*(?=.*\1)", "",
do.call(paste, c(.SD, sep=",")), perl = TRUE))),
.SDcols = names(dt1)[c(2:1, 3:5)]]
# VAR6
#1: 100 Brick,Place
#2: 23 Sand,Location,Tree
#3: 76 Concrete,Place,Wood
#4: 43 Stone,Vista,Forest
或按行顺序分组并执行 paste
V6 <- dt1[, sprintf("%s %s, %s", VAR2, VAR1,
toString(unique(unlist(.SD)))), 1:nrow(dt1), .SDcols = VAR3:VAR5]$V1
data.table(V6)
# V6
#1: 100 Brick, Place
#2: 23 Sand, Location, Tree
#3: 76 Concrete, Place, Wood
#4: 43 Stone, Vista, Forest