R:根据子字符串合并两个数据帧
R: merge two dataframes based on substring
我有两个数据框。 df1
看起来像:
Day Element Incident
1 2020-04-06 3101 Check incident by SOILING
2 2020-04-02 3102 Check alarm 5662
3 2020-05-21 3101 Check energy loss by METEO ERROR
4 2020-04-02 3202 Check ACDC grid
另一个,df2
,看起来像这样:
Day Element Incident Energy_loss
1 2020-04-06 3101 SOILING 0.05
2 2020-04-14 3101 SOILING 0.01
3 2020-05-21 3101 METEO ERROR 0.11
4 2020-06-15 3102 METEO ERROR 0.03
我想根据 Day
、Element
和 Incident
列合并它们,所以我需要在 [=13] 中找到 Incident
列=] 包含 df2
的列 Incident
。 df1
与 df2
不匹配的行可以在 Energy loss
列中留下 Nan
。
我试过通常的合并,但由于 merge
的条件之一是子字符串,它无法正常工作。
我期望的输出是:
Day Element Incident Energy loss
1 2020-04-06 3101 Check incident by SOILING 0.05
2 2020-04-02 3102 Check alarm 5662 Nan
3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
4 2020-04-02 3202 Check ACDC grid Nan
我们可以使用regex_left_join
library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = c('Day', 'Element', 'Incident')) %>%
select(Day = Day.x, Element = Element.x, Incident = Incident.x, Energy_loss)
-输出
# Day Element Incident Energy_loss
#1 2020-04-06 3101 Check incident by SOILING 0.05
#2 2020-04-02 3102 Check alarm 5662 NA
#3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
#4 2020-04-02 3202 Check ACDC grid NA
数据
df1 <- structure(list(Day = c("2020-04-06", "2020-04-02", "2020-05-21",
"2020-04-02"), Element = c(3101L, 3102L, 3101L, 3202L),
Incident = c("Check incident by SOILING",
"Check alarm 5662", "Check energy loss by METEO ERROR", "Check ACDC grid"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))
df2 <- structure(list(Day = c("2020-04-06", "2020-04-14", "2020-05-21",
"2020-06-15"), Element = c(3101L, 3101L, 3101L, 3102L), Incident = c("SOILING",
"SOILING", "METEO ERROR", "METEO ERROR"), Energy_loss = c(0.05,
0.01, 0.11, 0.03)), class = "data.frame", row.names = c("1",
"2", "3", "4"))
我有两个数据框。 df1
看起来像:
Day Element Incident
1 2020-04-06 3101 Check incident by SOILING
2 2020-04-02 3102 Check alarm 5662
3 2020-05-21 3101 Check energy loss by METEO ERROR
4 2020-04-02 3202 Check ACDC grid
另一个,df2
,看起来像这样:
Day Element Incident Energy_loss
1 2020-04-06 3101 SOILING 0.05
2 2020-04-14 3101 SOILING 0.01
3 2020-05-21 3101 METEO ERROR 0.11
4 2020-06-15 3102 METEO ERROR 0.03
我想根据 Day
、Element
和 Incident
列合并它们,所以我需要在 [=13] 中找到 Incident
列=] 包含 df2
的列 Incident
。 df1
与 df2
不匹配的行可以在 Energy loss
列中留下 Nan
。
我试过通常的合并,但由于 merge
的条件之一是子字符串,它无法正常工作。
我期望的输出是:
Day Element Incident Energy loss
1 2020-04-06 3101 Check incident by SOILING 0.05
2 2020-04-02 3102 Check alarm 5662 Nan
3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
4 2020-04-02 3202 Check ACDC grid Nan
我们可以使用regex_left_join
library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = c('Day', 'Element', 'Incident')) %>%
select(Day = Day.x, Element = Element.x, Incident = Incident.x, Energy_loss)
-输出
# Day Element Incident Energy_loss
#1 2020-04-06 3101 Check incident by SOILING 0.05
#2 2020-04-02 3102 Check alarm 5662 NA
#3 2020-05-21 3101 Check energy loss by METEO ERROR 0.11
#4 2020-04-02 3202 Check ACDC grid NA
数据
df1 <- structure(list(Day = c("2020-04-06", "2020-04-02", "2020-05-21",
"2020-04-02"), Element = c(3101L, 3102L, 3101L, 3202L),
Incident = c("Check incident by SOILING",
"Check alarm 5662", "Check energy loss by METEO ERROR", "Check ACDC grid"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))
df2 <- structure(list(Day = c("2020-04-06", "2020-04-14", "2020-05-21",
"2020-06-15"), Element = c(3101L, 3101L, 3101L, 3102L), Incident = c("SOILING",
"SOILING", "METEO ERROR", "METEO ERROR"), Energy_loss = c(0.05,
0.01, 0.11, 0.03)), class = "data.frame", row.names = c("1",
"2", "3", "4"))