根据另一个数据框替换特定值
Replace specific values based on another dataframe
首先,让我们从 DataFrame 1 (DF1) 开始:
DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")
我也有 DataFrame 2 (DF2) :
DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
c(1, 1, 2, 2),
c(9999, 8888, 777, 555),
c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")
对于 DF1 中的每一行,我必须查看 DF2 中是否有具有相同日期和 ID 的行。如果是,我必须用 DF2 中的值替换 DF1 中的值。
DF2 的列总是少于 DF1。如果某个列不在 DF2 中,我必须保留该特定列在 DF1 中的原始值。
最终输出是这样的:
results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")
你有什么建议吗?
df <- merge(DF1, DF2, by = c("date", "id"))
df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)
用您的变量替换 column
。
您可以为此使用 data.table 包的连接功能:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
给出:
> DF1
date id sales cost city
1: 06/19/2016 1 9999 101 LON
2: 06/20/2016 1 150 102 MTL
3: 06/21/2016 1 151 104 MTL
4: 06/22/2016 1 152 107 MTL
5: 06/23/2016 1 155 99 MTL
6: 06/19/2016 2 84 55 NY
7: 06/20/2016 2 83 55 NY
8: 06/21/2016 2 80 56 NY
9: 06/22/2016 2 777 57 QC
10: 06/23/2016 2 555 58 QC
当您在两个数据集中都有很多列时,使用 mget
而不是键入所有列名称会更容易。对于问题中使用的数据,它看起来像:
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
当你想构建一个需要预先添加的列名向量时,你可以这样做:
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)
tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"
tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"
results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)
首先,让我们从 DataFrame 1 (DF1) 开始:
DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")
我也有 DataFrame 2 (DF2) :
DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
c(1, 1, 2, 2),
c(9999, 8888, 777, 555),
c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")
对于 DF1 中的每一行,我必须查看 DF2 中是否有具有相同日期和 ID 的行。如果是,我必须用 DF2 中的值替换 DF1 中的值。
DF2 的列总是少于 DF1。如果某个列不在 DF2 中,我必须保留该特定列在 DF1 中的原始值。
最终输出是这样的:
results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
"06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
"06/22/2016", "06/23/2016"),
c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY",
"NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")
你有什么建议吗?
df <- merge(DF1, DF2, by = c("date", "id"))
df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)
用您的变量替换 column
。
您可以为此使用 data.table 包的连接功能:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
给出:
> DF1 date id sales cost city 1: 06/19/2016 1 9999 101 LON 2: 06/20/2016 1 150 102 MTL 3: 06/21/2016 1 151 104 MTL 4: 06/22/2016 1 152 107 MTL 5: 06/23/2016 1 155 99 MTL 6: 06/19/2016 2 84 55 NY 7: 06/20/2016 2 83 55 NY 8: 06/21/2016 2 80 56 NY 9: 06/22/2016 2 777 57 QC 10: 06/23/2016 2 555 58 QC
当您在两个数据集中都有很多列时,使用 mget
而不是键入所有列名称会更容易。对于问题中使用的数据,它看起来像:
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
当你想构建一个需要预先添加的列名向量时,你可以这样做:
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)
tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"
tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"
results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)