在 R 中将每个数据框列加入单独的 table

Joining each data frame column on separate table in R

我的调查结果跨越大约 90 列和超过 5000 行。使用代码键入原始数据(例如,1 代表 "yes",2 代表 "no")。每列都有不同数量的因素水平:例如,在家使用的语言、收入水平等。如何用整个 table 的实际答案替换原始代码?

原始数据的结构如下:

rawsurveydf <- data.frame(Q1_tenant = sample(c(1,2,9), 20, replace=TRUE),
    Q2_income = sample(c(1:9), 20, replace=TRUE), 
    Q3_satisfaction = sample(c(1:4,9), 20, replace=TRUE) )

以及各个代码的翻译:

Tenantcodes <- data.frame(code=c(1,2,9), Q1_tenant=c("Yes", "No", "Refusal"))
incomecodes <- data.frame(code=c(1:9), Q2_income=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes <- data.frame(code=c(1:4,9), Q3_satisfaction=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))

使用您的(原始)示例中的数据和代码:

Tenantcodes$Q1_tenant <- ifelse(Tenantcodes$Q1_tenant=="Yes",2,1)
rawsurveydf           <- merge(rawsurveydf$Q1_tenant, 
                               Tenantcodes, 
                               by="Q1_tenant", 
                               all.x=T)

当您需要将值映射到 2 个以上的级别时,您可以使用其他方法,例如索引赋值、gsub

如果您的所有列都具有相同的转换,您应该将它变成一个函数,然后 applysapply 该函数。如果每一列都有不同的自定义映射,那么显然您需要为每一列提供该逻辑。

您可以尝试的另一种选择是:

library(tidyverse)
rawsurveydf <- rawsurveydf %>%
    left_join(y = Tenantcodes, by = c("Q1_tenant" = "code"), suffix = c("", ".answer")) %>%
    left_join(y = incomecodes, by = c("Q2_income" = "code"), suffix = c("", ".answer")) %>%
    left_join(y = houssatiscodes, by = c("Q3_satisfaction" = "code"), suffix = c("", ".answer"))

这将为您提供所需的结果,同时仍保持原始数据。

如果您想将所有 90 列都设置为 运行,我建议您设置 "key" 以便可以一次实现所有联接。这可以通过在要合并的列中为您调用 code 的列提供相同的名称并将答案命名为不同的名称来实现。也许喜欢:

Tenantcodes<-data.frame(Q1_tenant=c(1,2,9), Q1_tenant_answer=c("Yes", "No", "Refusal"))
incomecodes<-data.frame(Q2_income=c(1:9), Q2_income_answer=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes<-data.frame(Q3_satisfaction=c(1:4,9), Q3_satisfaction_answer=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))

像这样,我们可以一次性搞定所有的连接:

result <- list(rawsurveydf, Tenantcodes, incomecodes, houssatiscodes)  %>% reduce(left_join)

您可以使用因子:

out <- rawsurveydf
out[] <- Map(function(x,y) factor(x,y$code,y[[2]]),
             rawsurveydf,
             list(Tenantcodes,incomecodes,houssatiscodes))

# out
# Q1_tenant Q2_income   Q3_satisfaction
# 1        Yes     60000 Strongly disagree
# 2         No     90000    Strongly agree
# 3         No     50000             Agree
# 4        Yes     80000           Refusal
# 5    Refusal     70000           Refusal
# 6         No    110000             Agree
# 7        Yes     60000    Strongly agree
# 8         No     40000          Disagree
# 9        Yes    110000 Strongly disagree
# 10       Yes    110000 Strongly disagree
# 11   Refusal     1e+05          Disagree
# 12       Yes     70000    Strongly agree
# 13   Refusal     60000 Strongly disagree
# 14       Yes     40000             Agree
# 15        No     1e+05           Refusal
# 16       Yes     90000           Refusal
# 17        No    110000    Strongly agree
# 18       Yes    110000 Strongly disagree
# 19        No     1e+05           Refusal
# 20        No     90000           Refusal

如果您需要字符列而不是因子列,请使用 as.character(factor(x,y$code,y[[2]])) 而不是 factor(x,y$code,y[[2]])