如何像字典一样添加键值对?
How to add key-value pair like dictionary?
我的数据(总共 8532 个 obs)如下所示:
Prd_Id Weight
DRA24 19.35
DRA24 NA
DRA24 NA
DRA24 19.35
DRA24 19.35
DRA59 8.27
DRA59 8.27
DRA59 8.27
DRA59 8.27
DRA59 NA
DRA59 NA
基本上问题是有很多对 Prd_id
和 weight
并且其中一些 Prd_id
没有提到 weight
例如我已经展示在第一个有但第二个和第三个没有的数据中,所以我知道 weight
的值,我只需要用它替换 NA,所有相同的 Prd_id
将具有相同的 weight
但是在 R 中没有像字典这样的东西,所以我发现很难解决这个问题。我尝试使用 for loop
但它花费了很长时间,我的代码如下所示:
for(i in 1:nrow(bms)){
for(j in 1:1555){
if(spl$Prd_Id[j]==bms$Prd_Id[i]){
bms$weight[i]=spl$weight[j]
}
}
}
bms
是整个 data
(8532 obs),spl
(1555 obs) 是 bms
的子集,其唯一值为 Prd_Id
.
正如@r2evans 建议的那样,您可以使用类似 SQL 的连接策略,结合 dplyr 的 coalesce
这看起来像这样:
library(dplyr)
# create 'bms'.
bms <- data_frame(
Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)
# create 'spl'
spl <- bms %>% filter(!is.na(Weight)) %>% filter(!duplicated(Prd_Id))
# SQL-like join and coalesce strategy
res <- bms %>%
left_join(spl, by = "Prd_Id", suffix = c("_bms", "_spl")) %>%
mutate(Weight = coalesce(Weight_bms, Weight_spl)) %>%
select(-Weight_bms, -Weight_spl)
这是一个基本的 R 解决方案
# example data
bms <- data.frame(
Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)
# create key-value pairs
spl <- unique(bms[!is.na(bms[,"Weight"]),])
spl <- setNames(spl[,"Weight"], spl[,"Prd_Id"])
# fill NAs
idx <- which(is.na(bms[,"Weight"]))
bms[idx,"Weight"] <- spl[bms[idx, "Prd_Id"]]
不需要 left_join
:
bms %>%
group_by(Prd_Id) %>%
mutate(Weight = Weight[!is.na(Weight)][1])
first
的另一种方式:
bms %>%
group_by(Prd_Id) %>%
mutate(Weight = first(Weight[!is.na(Weight)]))
结果:
# A tibble: 11 x 2
# Groups: Prd_Id [2]
Prd_Id Weight
<chr> <dbl>
1 DRA24 19.35
2 DRA24 19.35
3 DRA24 19.35
4 DRA24 19.35
5 DRA24 19.35
6 DRA59 8.27
7 DRA59 8.27
8 DRA59 8.27
9 DRA59 8.27
10 DRA59 8.27
11 DRA59 8.27
当然你也可以在 vanilla R 中做到这一点:
transform(bms, Weight = ave(Weight, Prd_Id, FUN = function(x) x[!is.na(x)][1]))
结果是一样的:
Prd_Id Weight
1 DRA24 19.35
2 DRA24 19.35
3 DRA24 19.35
4 DRA24 19.35
5 DRA24 19.35
6 DRA59 8.27
7 DRA59 8.27
8 DRA59 8.27
9 DRA59 8.27
10 DRA59 8.27
11 DRA59 8.27
我的数据(总共 8532 个 obs)如下所示:
Prd_Id Weight
DRA24 19.35
DRA24 NA
DRA24 NA
DRA24 19.35
DRA24 19.35
DRA59 8.27
DRA59 8.27
DRA59 8.27
DRA59 8.27
DRA59 NA
DRA59 NA
基本上问题是有很多对 Prd_id
和 weight
并且其中一些 Prd_id
没有提到 weight
例如我已经展示在第一个有但第二个和第三个没有的数据中,所以我知道 weight
的值,我只需要用它替换 NA,所有相同的 Prd_id
将具有相同的 weight
但是在 R 中没有像字典这样的东西,所以我发现很难解决这个问题。我尝试使用 for loop
但它花费了很长时间,我的代码如下所示:
for(i in 1:nrow(bms)){
for(j in 1:1555){
if(spl$Prd_Id[j]==bms$Prd_Id[i]){
bms$weight[i]=spl$weight[j]
}
}
}
bms
是整个 data
(8532 obs),spl
(1555 obs) 是 bms
的子集,其唯一值为 Prd_Id
.
正如@r2evans 建议的那样,您可以使用类似 SQL 的连接策略,结合 dplyr 的 coalesce
这看起来像这样:
library(dplyr)
# create 'bms'.
bms <- data_frame(
Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)
# create 'spl'
spl <- bms %>% filter(!is.na(Weight)) %>% filter(!duplicated(Prd_Id))
# SQL-like join and coalesce strategy
res <- bms %>%
left_join(spl, by = "Prd_Id", suffix = c("_bms", "_spl")) %>%
mutate(Weight = coalesce(Weight_bms, Weight_spl)) %>%
select(-Weight_bms, -Weight_spl)
这是一个基本的 R 解决方案
# example data
bms <- data.frame(
Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)
# create key-value pairs
spl <- unique(bms[!is.na(bms[,"Weight"]),])
spl <- setNames(spl[,"Weight"], spl[,"Prd_Id"])
# fill NAs
idx <- which(is.na(bms[,"Weight"]))
bms[idx,"Weight"] <- spl[bms[idx, "Prd_Id"]]
不需要 left_join
:
bms %>%
group_by(Prd_Id) %>%
mutate(Weight = Weight[!is.na(Weight)][1])
first
的另一种方式:
bms %>%
group_by(Prd_Id) %>%
mutate(Weight = first(Weight[!is.na(Weight)]))
结果:
# A tibble: 11 x 2
# Groups: Prd_Id [2]
Prd_Id Weight
<chr> <dbl>
1 DRA24 19.35
2 DRA24 19.35
3 DRA24 19.35
4 DRA24 19.35
5 DRA24 19.35
6 DRA59 8.27
7 DRA59 8.27
8 DRA59 8.27
9 DRA59 8.27
10 DRA59 8.27
11 DRA59 8.27
当然你也可以在 vanilla R 中做到这一点:
transform(bms, Weight = ave(Weight, Prd_Id, FUN = function(x) x[!is.na(x)][1]))
结果是一样的:
Prd_Id Weight
1 DRA24 19.35
2 DRA24 19.35
3 DRA24 19.35
4 DRA24 19.35
5 DRA24 19.35
6 DRA59 8.27
7 DRA59 8.27
8 DRA59 8.27
9 DRA59 8.27
10 DRA59 8.27
11 DRA59 8.27