合并两个不同的数据框(如 VLOOKUP)
Merge Two different Data Frame (like VLOOKUP)
我有两个数据框:
> head(df_Edges)
Source Target Type Weight
@kuabt @_chuad Directed 1
@kuabt @arifsetia2013 Directed 1
@kuabt @kuabt Directed 1
@kuabt @chongbeng Directed 1
@kuabt @billtay25 Directed 1
@kuabt @gst183 Directed 1
和
> head(df_Nodes)
Id Label
73 @kuabt
148 @billtay25
168 @chongbeng
187 @nonvitaltooth
216 @gst183
244 @arifsetia2013
我想将 df_edge 中的标签更改为 "Id number",所以结果将是这样的:
Source Target Type Weight
73 298 Directed 1
73 244 Directed 1
73 73 Directed 1
73 168 Directed 1
73 148 Directed 1
73 216 Directed 1
我是这样尝试的,
df<-merge(df_Nodes, df_Edges, by.x = "Label", by.y = "Source")
但结果还是和以前一样。
那么,我该怎么做呢?
谢谢。
我们可以使用 match
和 Map
在 'df_Edges' 中创建新列。
df_Edges[c("Source", "Target")] <- Map(function(x,y) df_Nodes$Id[match(x,y)],
df_Edges[c("Source", "Target")], list(df_Nodes$Label))
df_Edges
## Source Target Type Weight
## 1 73 NA Directed 1
## 2 73 244 Directed 1
## 3 73 73 Directed 1
## 4 73 168 Directed 1
## 5 73 148 Directed 1
## 6 73 216 Directed 1
或者我们可以使用dplyr
library(dplyr)
left_join(df_Edges, df_Nodes, by = c(Target = "Label")) %>%
mutate(Target = Id) %>%
left_join(., df_Nodes, by = c(Source = "Label")) %>%
mutate(Source = Id.y) %>%
select(-matches("Id"))
# Source Target Type Weight
#1 73 NA Directed 1
#2 73 244 Directed 1
#3 73 73 Directed 1
#4 73 168 Directed 1
#5 73 148 Directed 1
#6 73 216 Directed 1
这里不需要merge
,因为你可以直接用match
的两个应用程序来做到这一点:
df_Edges$Source <- df_Nodes$Id[match(df_Edges$Source, df_Nodes$Label)]
df_Edges$Target <- df_Nodes$Id[match(df_Edges$Target, df_Nodes$Label)]
df_Edges
## Source Target Type Weight
## 1 73 NA Directed 1
## 2 73 244 Directed 1
## 3 73 73 Directed 1
## 4 73 168 Directed 1
## 5 73 148 Directed 1
## 6 73 216 Directed 1
NA
值是因为在您的示例中 df_Nodes
缺少此行。
我有两个数据框:
> head(df_Edges)
Source Target Type Weight
@kuabt @_chuad Directed 1
@kuabt @arifsetia2013 Directed 1
@kuabt @kuabt Directed 1
@kuabt @chongbeng Directed 1
@kuabt @billtay25 Directed 1
@kuabt @gst183 Directed 1
和
> head(df_Nodes)
Id Label
73 @kuabt
148 @billtay25
168 @chongbeng
187 @nonvitaltooth
216 @gst183
244 @arifsetia2013
我想将 df_edge 中的标签更改为 "Id number",所以结果将是这样的:
Source Target Type Weight
73 298 Directed 1
73 244 Directed 1
73 73 Directed 1
73 168 Directed 1
73 148 Directed 1
73 216 Directed 1
我是这样尝试的,
df<-merge(df_Nodes, df_Edges, by.x = "Label", by.y = "Source")
但结果还是和以前一样。 那么,我该怎么做呢? 谢谢。
我们可以使用 match
和 Map
在 'df_Edges' 中创建新列。
df_Edges[c("Source", "Target")] <- Map(function(x,y) df_Nodes$Id[match(x,y)],
df_Edges[c("Source", "Target")], list(df_Nodes$Label))
df_Edges
## Source Target Type Weight
## 1 73 NA Directed 1
## 2 73 244 Directed 1
## 3 73 73 Directed 1
## 4 73 168 Directed 1
## 5 73 148 Directed 1
## 6 73 216 Directed 1
或者我们可以使用dplyr
library(dplyr)
left_join(df_Edges, df_Nodes, by = c(Target = "Label")) %>%
mutate(Target = Id) %>%
left_join(., df_Nodes, by = c(Source = "Label")) %>%
mutate(Source = Id.y) %>%
select(-matches("Id"))
# Source Target Type Weight
#1 73 NA Directed 1
#2 73 244 Directed 1
#3 73 73 Directed 1
#4 73 168 Directed 1
#5 73 148 Directed 1
#6 73 216 Directed 1
这里不需要merge
,因为你可以直接用match
的两个应用程序来做到这一点:
df_Edges$Source <- df_Nodes$Id[match(df_Edges$Source, df_Nodes$Label)]
df_Edges$Target <- df_Nodes$Id[match(df_Edges$Target, df_Nodes$Label)]
df_Edges
## Source Target Type Weight
## 1 73 NA Directed 1
## 2 73 244 Directed 1
## 3 73 73 Directed 1
## 4 73 168 Directed 1
## 5 73 148 Directed 1
## 6 73 216 Directed 1
NA
值是因为在您的示例中 df_Nodes
缺少此行。