如何在 R 中逐个元素地比较两个数据集?
How Do I Compare Two Datasets element by element in R?
我需要以 A、B、C、D 多项选择的方式检查 50 名不同学生的测试结果。
我有一个答案键的一维数据集,"answers" 我读入为
answers <- read.table("A1_Ans_only.txt", header = FALSE, sep = ",")
View(answers)
我的数据集 "results" 包含所有 50 名学生的所有答案。我读为 results <- read.csv("Form A1_only.csv", header = FALSE)
View(results)
因此,当我尝试 results==answers
或“评估(结果,答案)”之类的操作时,评估是我编写的定义为 'evaluate <- function(x,y){x==y}' 的函数,我会遇到各种错误,例如 "not equal-length data frames" 或当我将每个子集化为一维时不相同的级别向量。
谁能帮我评估结果数据框的每个元素,以确定每个学生答对了哪些问题?
This is a small sample of results:
structure(list(V1 = c(1L, 3L, 5L), V2 = c(NA, NA, NA), V3 = structure(c(2L,
1L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), V4 = structure(c(1L,
1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"), V5 = structure(c(2L,
2L, 3L), .Label = c("A", "B", "C", "D"), class = "factor"), V6 = structure(c(1L,
1L, 1L), .Label = c("A", "B", "C"), class = "factor"), V7 = structure(c(1L,
1L, 1L), .Label = c("A", "C", "D"), class = "factor"), V8 = structure(c(2L,
1L, 2L), .Label = c("A", "B", "D"), class = "factor"), V9 = structure(c(1L,
1L, 1L), .Label = c("A", "C", "D"), class = "factor"), V10 = structure(c(2L,
2L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), row.names = c(NA,
3L), class = "data.frame")
This is the sample from answers:
structure(list(V1 = structure(1L, .Label = "AAAAKEY", class = "factor"),
V2 = NA, V3 = structure(1L, .Label = "C", class = "factor"),
V4 = structure(1L, .Label = "A", class = "factor"), V5 = structure(1L, .Label = "C", class = "factor"),
V6 = structure(1L, .Label = "A", class = "factor"), V7 = structure(1L, .Label = "A", class = "factor"),
V8 = structure(1L, .Label = "B", class = "factor"), V9 = structure(1L, .Label = "A", class = "factor"),
V10 = structure(1L, .Label = "B", class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
-1L))
我们可以在复制'answers'后进行比较,使长度相等
results==answers[col(results)]
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 FALSE NA FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#2 FALSE NA FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
#3 FALSE NA FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
'answers' 的 V2 列中的 NA 导致 NA 输出,因为与 NA 的任何相等比较都会导致 NA。如果我们需要它作为 FALSE,那么之后要么将 NA 更改为 FALSE,要么使用 !is.na(answers)[col(results)]
执行 &
我需要以 A、B、C、D 多项选择的方式检查 50 名不同学生的测试结果。
我有一个答案键的一维数据集,"answers" 我读入为
answers <- read.table("A1_Ans_only.txt", header = FALSE, sep = ",")
View(answers)
我的数据集 "results" 包含所有 50 名学生的所有答案。我读为 results <- read.csv("Form A1_only.csv", header = FALSE)
View(results)
因此,当我尝试 results==answers
或“评估(结果,答案)”之类的操作时,评估是我编写的定义为 'evaluate <- function(x,y){x==y}' 的函数,我会遇到各种错误,例如 "not equal-length data frames" 或当我将每个子集化为一维时不相同的级别向量。
谁能帮我评估结果数据框的每个元素,以确定每个学生答对了哪些问题?
This is a small sample of results:
structure(list(V1 = c(1L, 3L, 5L), V2 = c(NA, NA, NA), V3 = structure(c(2L,
1L, 4L), .Label = c("A", "B", "C", "D"), class = "factor"), V4 = structure(c(1L,
1L, 1L), .Label = c("A", "B", "C", "D"), class = "factor"), V5 = structure(c(2L,
2L, 3L), .Label = c("A", "B", "C", "D"), class = "factor"), V6 = structure(c(1L,
1L, 1L), .Label = c("A", "B", "C"), class = "factor"), V7 = structure(c(1L,
1L, 1L), .Label = c("A", "C", "D"), class = "factor"), V8 = structure(c(2L,
1L, 2L), .Label = c("A", "B", "D"), class = "factor"), V9 = structure(c(1L,
1L, 1L), .Label = c("A", "C", "D"), class = "factor"), V10 = structure(c(2L,
2L, 1L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), row.names = c(NA,
3L), class = "data.frame")
This is the sample from answers:
structure(list(V1 = structure(1L, .Label = "AAAAKEY", class = "factor"),
V2 = NA, V3 = structure(1L, .Label = "C", class = "factor"),
V4 = structure(1L, .Label = "A", class = "factor"), V5 = structure(1L, .Label = "C", class = "factor"),
V6 = structure(1L, .Label = "A", class = "factor"), V7 = structure(1L, .Label = "A", class = "factor"),
V8 = structure(1L, .Label = "B", class = "factor"), V9 = structure(1L, .Label = "A", class = "factor"),
V10 = structure(1L, .Label = "B", class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), class = "data.frame", row.names = c(NA,
-1L))
我们可以在复制'answers'后进行比较,使长度相等
results==answers[col(results)]
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 FALSE NA FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#2 FALSE NA FALSE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
#3 FALSE NA FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
'answers' 的 V2 列中的 NA 导致 NA 输出,因为与 NA 的任何相等比较都会导致 NA。如果我们需要它作为 FALSE,那么之后要么将 NA 更改为 FALSE,要么使用 !is.na(answers)[col(results)]
&