"data table" 这样做 join/merge 的方法是什么?
What is the "data table" way of doing this join/merge?
我有一个 "dictionary" table 这样的:
dict <- data.table(
Nickname = c("Abby", "Ben", "Chris", "Dan", "Ed"),
Name = c("Abigail", "Benjamin", "Christopher", "Daniel", "Edward")
)
dict
# Nickname Name
# 1: Abby Abigail
# 2: Ben Benjamin
# 3: Chris Christopher
# 4: Dan Daniel
# 5: Ed Edward
还有一个 "data" table 这样的:
dat <- data.table(
Friend1 = c("Abby", "Ben", "Ben", "Chris"),
Friend2 = c("Ben", "Ed", NA, "Ed"),
Friend3 = c("Ed", NA, NA, "Dan"),
Friend4 = c("Dan", NA, NA, NA)
)
dat
# Friend1 Friend2 Friend3 Friend4
# 1: Abby Ben Ed Dan
# 2: Ben Ed NA NA
# 3: Ben NA NA NA
# 4: Chris Ed Dan NA
我想制作一个看起来像这样的data.table
result <- data.table(
Friend1.Nickname = c("Abby", "Ben", "Ben", "Chris"),
Friend1.Name = c("Abigail", "Benjamin", "Benjamin", "Christopher"),
Friend2.Nickname = c("Ben", "Ed", NA, "Ed"),
Friend2.Name = c("Benjamin", "Edward", NA, "Edward"),
Friend3.Nickname = c("Ed", NA, NA, "Dan"),
Friend3.Name = c("Edward", NA, NA, "Daniel"),
Friend4.Nickname = c("Dan", NA, NA, NA),
Friend4.Name = c("Daniel", NA, NA, NA)
)
result
# sorry, word wrapping makes this too annoying to copy
这就是我想到的解决方案:
friend_vars <- paste0("Friend", 1:4)
friend_nicks <- paste0(friend_vars, ".Nickname")
friend_names <- paste0(friend_vars, ".Name")
setnames(dat, friend_vars, friend_nicks)
for (i in 1:4) {
dat[, friend_names[i] := dict$Name[match(dat[[friend_nicks[i]]], dict$Nickname)], with = FALSE]
}
是否有更多 "data-table-esque" 方法来做到这一点?我敢肯定它既好又高效,但读起来很难看,而且从 data.table
的就地作业来看,我觉得我没有充分利用软件包所提供的功能。
我也不是一个非常强大的 SQL 用户,我不太习惯 table 加入术语。我觉得 在这里可能会有用,但我不确定如何将它应用到我的情况。
我没有想出与您的 result
完全匹配的解决方案,但您也许可以使用类似这样的解决方案:
dat[, id := .I]
dat.m <- melt(dat, id.vars='id', variable.name='Friend', value.name='Nickname')
setkey(dict, Nickname)
dat.m[, Name := dict[Nickname, Name]]
> dat.m
id Friend Nickname Name
1: 1 Friend1 Abby Abigail
2: 2 Friend1 Ben Benjamin
3: 3 Friend1 Ben Benjamin
4: 4 Friend1 Chris Christopher
5: 1 Friend2 Ben Benjamin
6: 2 Friend2 Ed Edward
7: 3 Friend2 NA NA
8: 4 Friend2 Ed Edward
9: 1 Friend3 Ed Edward
10: 2 Friend3 NA NA
11: 3 Friend3 NA NA
12: 4 Friend3 Dan Daniel
13: 1 Friend4 Dan Daniel
14: 2 Friend4 NA NA
15: 3 Friend4 NA NA
16: 4 Friend4 NA NA
变量 id
只是一个占位符,所以我可以融化 DT。
setkey(dict,Nickname)
dat[,paste(names(dat),"Name",sep="."):=lapply(.SD,function(x)dict[J(x)]$Name)]
setcolorder(dat,c(1,5,2,6,3,7,4,8))
dat
# Friend1 Friend1.Name Friend2 Friend2.Name Friend3 Friend3.Name Friend4 Friend4.Name
# 1: Abby Abigail Ben Benjamin Ed Edward Dan Daniel
# 2: Ben Benjamin Ed Edward NA NA NA NA
# 3: Ben Benjamin NA NA NA NA NA NA
# 4: Chris Christopher Ed Edward Dan Daniel NA NA
在基地,超丑:
cbind(dat, lapply(dat, function(x){dict$Name[match(x, dict$Nickname)]}))
Friend1 Friend2 Friend3 Friend4 V2 NA NA NA
1: Abby Ben Ed Dan Abigail Benjamin Edward Daniel
2: Ben Ed NA NA Benjamin Edward NA NA
3: Ben NA NA NA Benjamin NA NA NA
4: Chris Ed Dan NA Christopher Edward Daniel NA
使用data.table 1.9.5
:
for (nm in names(dat)) {
on = setattr("Nickname", 'names', nm)
dat[dict, paste0(nm, ".Name") := i.Name, on=on]
}
我们可以使用 on=
而不是设置密钥来加入。现在您可以使用 setcolorder()
重新排序名称。
除非绝对必要,否则我会避免重塑数据。这是 update while join 派上用场的地方。现在有了 on=
的论点,我忍不住发布了一个答案 :-)。
我有一个 "dictionary" table 这样的:
dict <- data.table(
Nickname = c("Abby", "Ben", "Chris", "Dan", "Ed"),
Name = c("Abigail", "Benjamin", "Christopher", "Daniel", "Edward")
)
dict
# Nickname Name
# 1: Abby Abigail
# 2: Ben Benjamin
# 3: Chris Christopher
# 4: Dan Daniel
# 5: Ed Edward
还有一个 "data" table 这样的:
dat <- data.table(
Friend1 = c("Abby", "Ben", "Ben", "Chris"),
Friend2 = c("Ben", "Ed", NA, "Ed"),
Friend3 = c("Ed", NA, NA, "Dan"),
Friend4 = c("Dan", NA, NA, NA)
)
dat
# Friend1 Friend2 Friend3 Friend4
# 1: Abby Ben Ed Dan
# 2: Ben Ed NA NA
# 3: Ben NA NA NA
# 4: Chris Ed Dan NA
我想制作一个看起来像这样的data.table
result <- data.table(
Friend1.Nickname = c("Abby", "Ben", "Ben", "Chris"),
Friend1.Name = c("Abigail", "Benjamin", "Benjamin", "Christopher"),
Friend2.Nickname = c("Ben", "Ed", NA, "Ed"),
Friend2.Name = c("Benjamin", "Edward", NA, "Edward"),
Friend3.Nickname = c("Ed", NA, NA, "Dan"),
Friend3.Name = c("Edward", NA, NA, "Daniel"),
Friend4.Nickname = c("Dan", NA, NA, NA),
Friend4.Name = c("Daniel", NA, NA, NA)
)
result
# sorry, word wrapping makes this too annoying to copy
这就是我想到的解决方案:
friend_vars <- paste0("Friend", 1:4)
friend_nicks <- paste0(friend_vars, ".Nickname")
friend_names <- paste0(friend_vars, ".Name")
setnames(dat, friend_vars, friend_nicks)
for (i in 1:4) {
dat[, friend_names[i] := dict$Name[match(dat[[friend_nicks[i]]], dict$Nickname)], with = FALSE]
}
是否有更多 "data-table-esque" 方法来做到这一点?我敢肯定它既好又高效,但读起来很难看,而且从 data.table
的就地作业来看,我觉得我没有充分利用软件包所提供的功能。
我也不是一个非常强大的 SQL 用户,我不太习惯 table 加入术语。我觉得
我没有想出与您的 result
完全匹配的解决方案,但您也许可以使用类似这样的解决方案:
dat[, id := .I]
dat.m <- melt(dat, id.vars='id', variable.name='Friend', value.name='Nickname')
setkey(dict, Nickname)
dat.m[, Name := dict[Nickname, Name]]
> dat.m
id Friend Nickname Name
1: 1 Friend1 Abby Abigail
2: 2 Friend1 Ben Benjamin
3: 3 Friend1 Ben Benjamin
4: 4 Friend1 Chris Christopher
5: 1 Friend2 Ben Benjamin
6: 2 Friend2 Ed Edward
7: 3 Friend2 NA NA
8: 4 Friend2 Ed Edward
9: 1 Friend3 Ed Edward
10: 2 Friend3 NA NA
11: 3 Friend3 NA NA
12: 4 Friend3 Dan Daniel
13: 1 Friend4 Dan Daniel
14: 2 Friend4 NA NA
15: 3 Friend4 NA NA
16: 4 Friend4 NA NA
变量 id
只是一个占位符,所以我可以融化 DT。
setkey(dict,Nickname)
dat[,paste(names(dat),"Name",sep="."):=lapply(.SD,function(x)dict[J(x)]$Name)]
setcolorder(dat,c(1,5,2,6,3,7,4,8))
dat
# Friend1 Friend1.Name Friend2 Friend2.Name Friend3 Friend3.Name Friend4 Friend4.Name
# 1: Abby Abigail Ben Benjamin Ed Edward Dan Daniel
# 2: Ben Benjamin Ed Edward NA NA NA NA
# 3: Ben Benjamin NA NA NA NA NA NA
# 4: Chris Christopher Ed Edward Dan Daniel NA NA
在基地,超丑:
cbind(dat, lapply(dat, function(x){dict$Name[match(x, dict$Nickname)]}))
Friend1 Friend2 Friend3 Friend4 V2 NA NA NA
1: Abby Ben Ed Dan Abigail Benjamin Edward Daniel
2: Ben Ed NA NA Benjamin Edward NA NA
3: Ben NA NA NA Benjamin NA NA NA
4: Chris Ed Dan NA Christopher Edward Daniel NA
使用data.table 1.9.5
:
for (nm in names(dat)) {
on = setattr("Nickname", 'names', nm)
dat[dict, paste0(nm, ".Name") := i.Name, on=on]
}
我们可以使用 on=
而不是设置密钥来加入。现在您可以使用 setcolorder()
重新排序名称。
除非绝对必要,否则我会避免重塑数据。这是 update while join 派上用场的地方。现在有了 on=
的论点,我忍不住发布了一个答案 :-)。