在 R 中将每个数据框列加入单独的 table
Joining each data frame column on separate table in R
我的调查结果跨越大约 90 列和超过 5000 行。使用代码键入原始数据(例如,1 代表 "yes",2 代表 "no")。每列都有不同数量的因素水平:例如,在家使用的语言、收入水平等。如何用整个 table 的实际答案替换原始代码?
原始数据的结构如下:
rawsurveydf <- data.frame(Q1_tenant = sample(c(1,2,9), 20, replace=TRUE),
Q2_income = sample(c(1:9), 20, replace=TRUE),
Q3_satisfaction = sample(c(1:4,9), 20, replace=TRUE) )
以及各个代码的翻译:
Tenantcodes <- data.frame(code=c(1,2,9), Q1_tenant=c("Yes", "No", "Refusal"))
incomecodes <- data.frame(code=c(1:9), Q2_income=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes <- data.frame(code=c(1:4,9), Q3_satisfaction=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))
使用您的(原始)示例中的数据和代码:
Tenantcodes$Q1_tenant <- ifelse(Tenantcodes$Q1_tenant=="Yes",2,1)
rawsurveydf <- merge(rawsurveydf$Q1_tenant,
Tenantcodes,
by="Q1_tenant",
all.x=T)
当您需要将值映射到 2 个以上的级别时,您可以使用其他方法,例如索引赋值、gsub
等
如果您的所有列都具有相同的转换,您应该将它变成一个函数,然后 apply
或 sapply
该函数。如果每一列都有不同的自定义映射,那么显然您需要为每一列提供该逻辑。
您可以尝试的另一种选择是:
library(tidyverse)
rawsurveydf <- rawsurveydf %>%
left_join(y = Tenantcodes, by = c("Q1_tenant" = "code"), suffix = c("", ".answer")) %>%
left_join(y = incomecodes, by = c("Q2_income" = "code"), suffix = c("", ".answer")) %>%
left_join(y = houssatiscodes, by = c("Q3_satisfaction" = "code"), suffix = c("", ".answer"))
这将为您提供所需的结果,同时仍保持原始数据。
如果您想将所有 90 列都设置为 运行,我建议您设置 "key" 以便可以一次实现所有联接。这可以通过在要合并的列中为您调用 code
的列提供相同的名称并将答案命名为不同的名称来实现。也许喜欢:
Tenantcodes<-data.frame(Q1_tenant=c(1,2,9), Q1_tenant_answer=c("Yes", "No", "Refusal"))
incomecodes<-data.frame(Q2_income=c(1:9), Q2_income_answer=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes<-data.frame(Q3_satisfaction=c(1:4,9), Q3_satisfaction_answer=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))
像这样,我们可以一次性搞定所有的连接:
result <- list(rawsurveydf, Tenantcodes, incomecodes, houssatiscodes) %>% reduce(left_join)
您可以使用因子:
out <- rawsurveydf
out[] <- Map(function(x,y) factor(x,y$code,y[[2]]),
rawsurveydf,
list(Tenantcodes,incomecodes,houssatiscodes))
# out
# Q1_tenant Q2_income Q3_satisfaction
# 1 Yes 60000 Strongly disagree
# 2 No 90000 Strongly agree
# 3 No 50000 Agree
# 4 Yes 80000 Refusal
# 5 Refusal 70000 Refusal
# 6 No 110000 Agree
# 7 Yes 60000 Strongly agree
# 8 No 40000 Disagree
# 9 Yes 110000 Strongly disagree
# 10 Yes 110000 Strongly disagree
# 11 Refusal 1e+05 Disagree
# 12 Yes 70000 Strongly agree
# 13 Refusal 60000 Strongly disagree
# 14 Yes 40000 Agree
# 15 No 1e+05 Refusal
# 16 Yes 90000 Refusal
# 17 No 110000 Strongly agree
# 18 Yes 110000 Strongly disagree
# 19 No 1e+05 Refusal
# 20 No 90000 Refusal
如果您需要字符列而不是因子列,请使用 as.character(factor(x,y$code,y[[2]]))
而不是 factor(x,y$code,y[[2]])
。
我的调查结果跨越大约 90 列和超过 5000 行。使用代码键入原始数据(例如,1 代表 "yes",2 代表 "no")。每列都有不同数量的因素水平:例如,在家使用的语言、收入水平等。如何用整个 table 的实际答案替换原始代码?
原始数据的结构如下:
rawsurveydf <- data.frame(Q1_tenant = sample(c(1,2,9), 20, replace=TRUE),
Q2_income = sample(c(1:9), 20, replace=TRUE),
Q3_satisfaction = sample(c(1:4,9), 20, replace=TRUE) )
以及各个代码的翻译:
Tenantcodes <- data.frame(code=c(1,2,9), Q1_tenant=c("Yes", "No", "Refusal"))
incomecodes <- data.frame(code=c(1:9), Q2_income=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes <- data.frame(code=c(1:4,9), Q3_satisfaction=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))
使用您的(原始)示例中的数据和代码:
Tenantcodes$Q1_tenant <- ifelse(Tenantcodes$Q1_tenant=="Yes",2,1)
rawsurveydf <- merge(rawsurveydf$Q1_tenant,
Tenantcodes,
by="Q1_tenant",
all.x=T)
当您需要将值映射到 2 个以上的级别时,您可以使用其他方法,例如索引赋值、gsub
等
如果您的所有列都具有相同的转换,您应该将它变成一个函数,然后 apply
或 sapply
该函数。如果每一列都有不同的自定义映射,那么显然您需要为每一列提供该逻辑。
您可以尝试的另一种选择是:
library(tidyverse)
rawsurveydf <- rawsurveydf %>%
left_join(y = Tenantcodes, by = c("Q1_tenant" = "code"), suffix = c("", ".answer")) %>%
left_join(y = incomecodes, by = c("Q2_income" = "code"), suffix = c("", ".answer")) %>%
left_join(y = houssatiscodes, by = c("Q3_satisfaction" = "code"), suffix = c("", ".answer"))
这将为您提供所需的结果,同时仍保持原始数据。
如果您想将所有 90 列都设置为 运行,我建议您设置 "key" 以便可以一次实现所有联接。这可以通过在要合并的列中为您调用 code
的列提供相同的名称并将答案命名为不同的名称来实现。也许喜欢:
Tenantcodes<-data.frame(Q1_tenant=c(1,2,9), Q1_tenant_answer=c("Yes", "No", "Refusal"))
incomecodes<-data.frame(Q2_income=c(1:9), Q2_income_answer=as.numeric(c(seq(30000, 110000, by=10^4))))
houssatiscodes<-data.frame(Q3_satisfaction=c(1:4,9), Q3_satisfaction_answer=c("Strongly disagree", "Disagree", "Agree", "Strongly agree", "Refusal"))
像这样,我们可以一次性搞定所有的连接:
result <- list(rawsurveydf, Tenantcodes, incomecodes, houssatiscodes) %>% reduce(left_join)
您可以使用因子:
out <- rawsurveydf
out[] <- Map(function(x,y) factor(x,y$code,y[[2]]),
rawsurveydf,
list(Tenantcodes,incomecodes,houssatiscodes))
# out
# Q1_tenant Q2_income Q3_satisfaction
# 1 Yes 60000 Strongly disagree
# 2 No 90000 Strongly agree
# 3 No 50000 Agree
# 4 Yes 80000 Refusal
# 5 Refusal 70000 Refusal
# 6 No 110000 Agree
# 7 Yes 60000 Strongly agree
# 8 No 40000 Disagree
# 9 Yes 110000 Strongly disagree
# 10 Yes 110000 Strongly disagree
# 11 Refusal 1e+05 Disagree
# 12 Yes 70000 Strongly agree
# 13 Refusal 60000 Strongly disagree
# 14 Yes 40000 Agree
# 15 No 1e+05 Refusal
# 16 Yes 90000 Refusal
# 17 No 110000 Strongly agree
# 18 Yes 110000 Strongly disagree
# 19 No 1e+05 Refusal
# 20 No 90000 Refusal
如果您需要字符列而不是因子列,请使用 as.character(factor(x,y$code,y[[2]]))
而不是 factor(x,y$code,y[[2]])
。