R - 基于多个条件匹配来自 2 个数据帧的值(当查找 ID 的顺序是随机的时)
R - Match values from 2 dataframes based on multiple condtions (when the order of lookup IDs are random)
嗨,我有两个数据框:
df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11),
Played_together = c(1,0,0,1,1,0,0,0,1,0,1),
Event=c(1,1,1,1,2,2,2,2,2,2,2),
Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60))
df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11),
Played_together = c(1,1,1,1),
Event=c(1,2,2,2))
df1 看起来像这样:
PersonId1 PersonId2 Played_together Event Utility
1 1 11 1 1 20.0
2 2 12 0 1 -2.0
3 3 13 0 1 -5.0
4 4 14 1 1 10.0
5 5 15 1 2 30.0
6 6 16 0 2 2.0
7 7 17 0 2 1.0
8 8 18 0 2 0.5
9 9 19 1 2 50.0
10 10 20 0 2 -1.0
11 1 11 1 2 60.0
df2 看起来像这样:
PersonId1 PersonId2 Played_together Event
1 11 1 1 1
2 15 5 1 2
3 9 19 1 2
4 1 11 1 2
注意 df2 不仅仅是 df1$played_together==1。 (例如 PlayerId1 = 4 和 PlayerId2=14 不存在于 df2.[=14=]
另请注意,虽然 df2 是 df1 的子集,但个体在 df2 中出现的顺序是随机的。例如,在第 1 行的 df1 中,我们看到事件 1 的 playerid1 =1 和 playerId2 = 11。但是在第 1 行的 df2 中,我们看到对于事件 1,playerid1 =11 和 playerId2 = 1。这两种情况完全相同,我想查找 Utility 从 df1 到df2。每个事件都必须进行合并。最终输出应如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
我知道 R 中存在合并函数,但我不知道当查找 ID 随机显示时该怎么办。如果有人能帮我一点忙,我将不胜感激。提前致谢。
这是我为您准备的:
library(dplyr)
rbind(left_join(df2, df1,
by = c("PersonId2" = "PersonId1", "PersonId1" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event")),
left_join(df2, df1,
by = c("PersonId1" = "PersonId1", "PersonId2" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event"))) %>%
filter(!is.na(Utility))
基本上,您的数据似乎有时会翻转 personid。我们可以将两个连接绑定在一起,然后过滤掉那些具有 NA
.
效用的行
您的输出如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
一种解决方案是使用 PersonId1
和 PersonId2
的组合来创建一个 "Team" 列,这样可以使两个团队 min(PersonId) : max(PersonId)
。现在,在 Team
和 Event
上加入 df1
和 df2
以获得所需的数据。
library(dplyr)
df2 %>% rowwise() %>%
mutate(Team = paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))) %>%
inner_join(df1 %>% rowwise() %>%
mutate(Team =
paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))),
by = c("Team", "Event")) %>%
select(PersonId1 = PersonId1.x, PersonId2 = PersonId2.x,
Played_together = Played_together.x, Event, Utility) %>%
as.data.frame()
# PersonId1 PersonId2 Played_together Event Utility
# 1 11 1 1 1 20
# 2 15 5 1 2 30
# 3 9 19 1 2 50
# 4 1 11 1 2 60
嗨,我有两个数据框:
df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11),
Played_together = c(1,0,0,1,1,0,0,0,1,0,1),
Event=c(1,1,1,1,2,2,2,2,2,2,2),
Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60))
df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11),
Played_together = c(1,1,1,1),
Event=c(1,2,2,2))
df1 看起来像这样:
PersonId1 PersonId2 Played_together Event Utility
1 1 11 1 1 20.0
2 2 12 0 1 -2.0
3 3 13 0 1 -5.0
4 4 14 1 1 10.0
5 5 15 1 2 30.0
6 6 16 0 2 2.0
7 7 17 0 2 1.0
8 8 18 0 2 0.5
9 9 19 1 2 50.0
10 10 20 0 2 -1.0
11 1 11 1 2 60.0
df2 看起来像这样:
PersonId1 PersonId2 Played_together Event
1 11 1 1 1
2 15 5 1 2
3 9 19 1 2
4 1 11 1 2
注意 df2 不仅仅是 df1$played_together==1。 (例如 PlayerId1 = 4 和 PlayerId2=14 不存在于 df2.[=14=]
另请注意,虽然 df2 是 df1 的子集,但个体在 df2 中出现的顺序是随机的。例如,在第 1 行的 df1 中,我们看到事件 1 的 playerid1 =1 和 playerId2 = 11。但是在第 1 行的 df2 中,我们看到对于事件 1,playerid1 =11 和 playerId2 = 1。这两种情况完全相同,我想查找 Utility 从 df1 到df2。每个事件都必须进行合并。最终输出应如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
我知道 R 中存在合并函数,但我不知道当查找 ID 随机显示时该怎么办。如果有人能帮我一点忙,我将不胜感激。提前致谢。
这是我为您准备的:
library(dplyr)
rbind(left_join(df2, df1,
by = c("PersonId2" = "PersonId1", "PersonId1" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event")),
left_join(df2, df1,
by = c("PersonId1" = "PersonId1", "PersonId2" = "PersonId2",
"Played_together" = "Played_together", "Event" = "Event"))) %>%
filter(!is.na(Utility))
基本上,您的数据似乎有时会翻转 personid。我们可以将两个连接绑定在一起,然后过滤掉那些具有 NA
.
您的输出如下所示:
PersonId1 PersonId2 Played_together Event Utility
1 11 1 1 1 20
2 15 5 1 2 30
3 9 19 1 2 50
4 1 11 1 2 60
一种解决方案是使用 PersonId1
和 PersonId2
的组合来创建一个 "Team" 列,这样可以使两个团队 min(PersonId) : max(PersonId)
。现在,在 Team
和 Event
上加入 df1
和 df2
以获得所需的数据。
library(dplyr)
df2 %>% rowwise() %>%
mutate(Team = paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))) %>%
inner_join(df1 %>% rowwise() %>%
mutate(Team =
paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))),
by = c("Team", "Event")) %>%
select(PersonId1 = PersonId1.x, PersonId2 = PersonId2.x,
Played_together = Played_together.x, Event, Utility) %>%
as.data.frame()
# PersonId1 PersonId2 Played_together Event Utility
# 1 11 1 1 1 20
# 2 15 5 1 2 30
# 3 9 19 1 2 50
# 4 1 11 1 2 60