使用数据字典应用值替换
Apply value replacement using data dictionary
df1=c("blue,green,green", "blue", "green")
dictionary=data.frame(label=c("blue","green"),value=c(1,2))
want=c("1,2,2","1","2")
我想用数据字典替换一些数据。数据 (df1) 在一个单元格中可能有多个条目,以逗号分隔。我看过 str_replace_all 并且我能够通过 str_replace_all(colors,c("blue"="1","green"="2"))
做到这一点,但我无法使用字典数据框创建 c("blue"="1","green"="2")
。我有一百多个项目要编码,所以硬编码不是一种选择。
非常感谢任何关于如何使这项工作或其他方式的指导!
从字典创建命名向量并使用 str_replace_all
library(dplyr)
library(stringr)
library(tibble)
dictionary %>%
mutate(value = as.character(value)) %>%
deframe %>%
str_replace_all(df1, .)
#[1] "1,2,2" "1" "2"
这是一个基本的 R 选项:
rename <- with(dictionary, setNames(value, label))
lapply(strsplit(df1, ","), \(x) unname(rename[x])) |>
lapply(\(x) paste(x, collapse = ",")) |>
unlist()
[1] "1,2,2" "1" "2"
Base R 选项 1 使用嵌套 vapply()
的:
# dict => named integer vector
dict <- setNames(dictionary$value, dictionary$label)
# Loop through and replace values: df1_replaced => character vector
df1_replaced <- vapply(
df1,
function(x){
y <- strsplit(
x,
","
)
vapply(
y,
function(z){
toString(
dict[
match(
z,
names(dict)
)
]
)
},
character(1)
)
},
character(1),
USE.NAMES = FALSE
)
df1=c("blue,green,green", "blue", "green")
dictionary=data.frame(label=c("blue","green"),value=c(1,2))
want=c("1,2,2","1","2")
我想用数据字典替换一些数据。数据 (df1) 在一个单元格中可能有多个条目,以逗号分隔。我看过 str_replace_all 并且我能够通过 str_replace_all(colors,c("blue"="1","green"="2"))
做到这一点,但我无法使用字典数据框创建 c("blue"="1","green"="2")
。我有一百多个项目要编码,所以硬编码不是一种选择。
非常感谢任何关于如何使这项工作或其他方式的指导!
从字典创建命名向量并使用 str_replace_all
library(dplyr)
library(stringr)
library(tibble)
dictionary %>%
mutate(value = as.character(value)) %>%
deframe %>%
str_replace_all(df1, .)
#[1] "1,2,2" "1" "2"
这是一个基本的 R 选项:
rename <- with(dictionary, setNames(value, label))
lapply(strsplit(df1, ","), \(x) unname(rename[x])) |>
lapply(\(x) paste(x, collapse = ",")) |>
unlist()
[1] "1,2,2" "1" "2"
Base R 选项 1 使用嵌套 vapply()
的:
# dict => named integer vector
dict <- setNames(dictionary$value, dictionary$label)
# Loop through and replace values: df1_replaced => character vector
df1_replaced <- vapply(
df1,
function(x){
y <- strsplit(
x,
","
)
vapply(
y,
function(z){
toString(
dict[
match(
z,
names(dict)
)
]
)
},
character(1)
)
},
character(1),
USE.NAMES = FALSE
)