在 R 中将每个数据框列加入单独的 table

Question

我的调查结果跨越大约 90 列和超过 5000 行。使用代码键入原始数据（例如，1 代表 "yes"，2 代表 "no"）。每列都有不同数量的因素水平：例如，在家使用的语言、收入水平等。如何用整个 table 的实际答案替换原始代码？

原始数据的结构如下：

rawsurveydf <- data.frame(Q1_tenant = sample(c(1,2,9), 20, replace=TRUE),
    Q2_income = sample(c(1:9), 20, replace=TRUE), 
    Q3_satisfaction = sample(c(1:4,9), 20, replace=TRUE) )

以及各个代码的翻译：

Tenantcodes <- data.frame(code=c(1,2,9), Q1_tenant=c("Yes", "No", "Refusal"))
incomecodes <- data.frame(code=c(1:9), Q2_income=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes <- data.frame(code=c(1:4,9), Q3_satisfaction=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))

Answer 1

使用您的（原始）示例中的数据和代码：

Tenantcodes$Q1_tenant <- ifelse(Tenantcodes$Q1_tenant=="Yes",2,1)
rawsurveydf           <- merge(rawsurveydf$Q1_tenant, 
                               Tenantcodes, 
                               by="Q1_tenant", 
                               all.x=T)

当您需要将值映射到 2 个以上的级别时，您可以使用其他方法，例如索引赋值、gsub 等

如果您的所有列都具有相同的转换，您应该将它变成一个函数，然后 apply 或 sapply 该函数。如果每一列都有不同的自定义映射，那么显然您需要为每一列提供该逻辑。

Answer 2

您可以尝试的另一种选择是：

library(tidyverse)
rawsurveydf <- rawsurveydf %>%
    left_join(y = Tenantcodes, by = c("Q1_tenant" = "code"), suffix = c("", ".answer")) %>%
    left_join(y = incomecodes, by = c("Q2_income" = "code"), suffix = c("", ".answer")) %>%
    left_join(y = houssatiscodes, by = c("Q3_satisfaction" = "code"), suffix = c("", ".answer"))

这将为您提供所需的结果，同时仍保持原始数据。

如果您想将所有 90 列都设置为运行，我建议您设置 "key" 以便可以一次实现所有联接。这可以通过在要合并的列中为您调用 code 的列提供相同的名称并将答案命名为不同的名称来实现。也许喜欢：

Tenantcodes<-data.frame(Q1_tenant=c(1,2,9), Q1_tenant_answer=c("Yes", "No", "Refusal"))
incomecodes<-data.frame(Q2_income=c(1:9), Q2_income_answer=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes<-data.frame(Q3_satisfaction=c(1:4,9), Q3_satisfaction_answer=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))

像这样，我们可以一次性搞定所有的连接：

result <- list(rawsurveydf, Tenantcodes, incomecodes, houssatiscodes)  %>% reduce(left_join)

Answer 3

您可以使用因子：

out <- rawsurveydf
out[] <- Map(function(x,y) factor(x,y$code,y[[2]]),
             rawsurveydf,
             list(Tenantcodes,incomecodes,houssatiscodes))

# out
# Q1_tenant Q2_income   Q3_satisfaction
# 1        Yes     60000 Strongly disagree
# 2         No     90000    Strongly agree
# 3         No     50000             Agree
# 4        Yes     80000           Refusal
# 5    Refusal     70000           Refusal
# 6         No    110000             Agree
# 7        Yes     60000    Strongly agree
# 8         No     40000          Disagree
# 9        Yes    110000 Strongly disagree
# 10       Yes    110000 Strongly disagree
# 11   Refusal     1e+05          Disagree
# 12       Yes     70000    Strongly agree
# 13   Refusal     60000 Strongly disagree
# 14       Yes     40000             Agree
# 15        No     1e+05           Refusal
# 16       Yes     90000           Refusal
# 17        No    110000    Strongly agree
# 18       Yes    110000 Strongly disagree
# 19        No     1e+05           Refusal
# 20        No     90000           Refusal

如果您需要字符列而不是因子列，请使用 as.character(factor(x,y$code,y[[2]])) 而不是 factor(x,y$code,y[[2]])。

在 R 中将每个数据框列加入单独的 table

Joining each data frame column on separate table in R

r

survey

categorical-data