如何将刺激随机分配到 4 个治疗组,并确保每个组包含偶数个 true/false 语句?
How do I randomly assign stimuli into 4 treatment groups, and ensure that each group contains an even number of true/false statements?
我有一组刺激(陈述),一半是真的,一半是假的。我想将它们随机分配到 4 个包含相同数量陈述的集合,其中一半是正确的陈述,一半是错误的陈述。
这是我到目前为止所了解的,但我需要补充一点,4 组的随机化应该基于特定二进制列的内容(即,语句是真还是假):
statements <- data.frame(item_ID = c("1", "3", "4", "5", "6", "7"),
item = c("The first windmills were built in Persia.",
"Blackberries, raspberries, and strawberries belong to the Rose family.",
"The painting “Bal du moulin de la Galette” was created by Renoir.",
"The name of the Russian space platform Mir means ‘peace’.",
"The Congo has the largest water flow rate of any river in Africa.",
"Alberto Fujimori served as president of Peru from 1990 - 2000."
), actual_truth = c("TRUE", "TRUE", "TRUE", "TRUE", "FALSE", "FALSE"
), source = c("DK", "DK", "DK", "DK", "DK", "DK"))
ns <- nrow(statements) * c(0.25, 0.25, 0.25, 0.25)
sum(ns)
rep(1:4, times = ns)
set.seed(4)
head(samp <- sample(rep(1:4, times = ns)))
set1 <- statements[samp == 1,]
set2 <- statements[samp == 2,]
set3 <- statements[samp == 3,]
set4 <- statements[samp == 4,]
分配偶数 bin
部分选项:
library(dplyr)
set.seed(42)
statements <- statements %>%
group_by(actual_truth) %>%
mutate(samp = sample(rep(1:4, length.out = n(), replace = TRUE))) %>%
ungroup()
statements
# # A tibble: 6 x 5
# item_ID item actual_truth source samp
# <fct> <fct> <fct> <fct> <int>
# 1 1 The first windmills were built in Persia. TRUE DK 2
# 2 3 Blackberries, raspberries, and strawberries belong to the Rose family. TRUE DK 3
# 3 4 The painting Bal du moulin de la Galette was created by Renoir. TRUE DK 4
# 4 5 The name of the Russian space platform Mir means peace . TRUE DK 1
# 5 6 The Congo has the largest water flow rate of any river in Africa. FALSE DK 2
# 6 7 Alberto Fujimori served as president of Peru from 1990 - 2000. FALSE DK 1
验证每个中有多少个:
xtabs(~ actual_truth + samp, data = statements)
# samp
# actual_truth 1 2 3 4
# FALSE 1 1 0 0
# TRUE 1 1 1 1
基础 R:
statements <- do.call(rbind,
by(statements, statements$actual_truth,
function(x) transform(x, samp = sample(rep(1:4, length.out = nrow(x), replace = TRUE)))))
(注意:由于 by
和 dplyr::
排序有些不同,即使上面 set.seed
,它们的结果也不同。这完全是由于处理顺序,而不是实施的正确性。)
data.table
:
library(data.table)
statementsDT <- copy(statements)
setDT(statementsDT)
statementsDT[, samp := sample(rep(1:4, length.out = .N, replace = TRUE)), by = actual_truth]
(注:同上。)
分成不同的组
对于这一步,虽然您可以做您在问题中所做的事情(分配给 set1
到 set4
),但我建议您对一组做的是对每个组进行相同的操作,因此最好是 (1) 将它们放在同一帧中并在自然分组操作中处理它们(例如,dplyr::group_by
或 data.table
的 by=
争论);或 (2) 将它们拆分成 list
并用 lapply
.
处理它们
例如:
sets <- split(statements, statements$samp)
生成一个列表,在本例中长度为 4,其中它们的顺序通常是键的字典排序(在本例中为 $samp
)。
假设您编写了一个处理您的 set
之一的函数 myfunc
,那么您会做
out <- lapply(sets, myfunc)
用函数处理你的每个集合。 (无需单独执行每个 samp==1
。)
我有一组刺激(陈述),一半是真的,一半是假的。我想将它们随机分配到 4 个包含相同数量陈述的集合,其中一半是正确的陈述,一半是错误的陈述。
这是我到目前为止所了解的,但我需要补充一点,4 组的随机化应该基于特定二进制列的内容(即,语句是真还是假):
statements <- data.frame(item_ID = c("1", "3", "4", "5", "6", "7"),
item = c("The first windmills were built in Persia.",
"Blackberries, raspberries, and strawberries belong to the Rose family.",
"The painting “Bal du moulin de la Galette” was created by Renoir.",
"The name of the Russian space platform Mir means ‘peace’.",
"The Congo has the largest water flow rate of any river in Africa.",
"Alberto Fujimori served as president of Peru from 1990 - 2000."
), actual_truth = c("TRUE", "TRUE", "TRUE", "TRUE", "FALSE", "FALSE"
), source = c("DK", "DK", "DK", "DK", "DK", "DK"))
ns <- nrow(statements) * c(0.25, 0.25, 0.25, 0.25)
sum(ns)
rep(1:4, times = ns)
set.seed(4)
head(samp <- sample(rep(1:4, times = ns)))
set1 <- statements[samp == 1,]
set2 <- statements[samp == 2,]
set3 <- statements[samp == 3,]
set4 <- statements[samp == 4,]
分配偶数 bin
部分选项:
library(dplyr)
set.seed(42)
statements <- statements %>%
group_by(actual_truth) %>%
mutate(samp = sample(rep(1:4, length.out = n(), replace = TRUE))) %>%
ungroup()
statements
# # A tibble: 6 x 5
# item_ID item actual_truth source samp
# <fct> <fct> <fct> <fct> <int>
# 1 1 The first windmills were built in Persia. TRUE DK 2
# 2 3 Blackberries, raspberries, and strawberries belong to the Rose family. TRUE DK 3
# 3 4 The painting Bal du moulin de la Galette was created by Renoir. TRUE DK 4
# 4 5 The name of the Russian space platform Mir means peace . TRUE DK 1
# 5 6 The Congo has the largest water flow rate of any river in Africa. FALSE DK 2
# 6 7 Alberto Fujimori served as president of Peru from 1990 - 2000. FALSE DK 1
验证每个中有多少个:
xtabs(~ actual_truth + samp, data = statements)
# samp
# actual_truth 1 2 3 4
# FALSE 1 1 0 0
# TRUE 1 1 1 1
基础 R:
statements <- do.call(rbind,
by(statements, statements$actual_truth,
function(x) transform(x, samp = sample(rep(1:4, length.out = nrow(x), replace = TRUE)))))
(注意:由于 by
和 dplyr::
排序有些不同,即使上面 set.seed
,它们的结果也不同。这完全是由于处理顺序,而不是实施的正确性。)
data.table
:
library(data.table)
statementsDT <- copy(statements)
setDT(statementsDT)
statementsDT[, samp := sample(rep(1:4, length.out = .N, replace = TRUE)), by = actual_truth]
(注:同上。)
分成不同的组
对于这一步,虽然您可以做您在问题中所做的事情(分配给 set1
到 set4
),但我建议您对一组做的是对每个组进行相同的操作,因此最好是 (1) 将它们放在同一帧中并在自然分组操作中处理它们(例如,dplyr::group_by
或 data.table
的 by=
争论);或 (2) 将它们拆分成 list
并用 lapply
.
例如:
sets <- split(statements, statements$samp)
生成一个列表,在本例中长度为 4,其中它们的顺序通常是键的字典排序(在本例中为 $samp
)。
假设您编写了一个处理您的 set
之一的函数 myfunc
,那么您会做
out <- lapply(sets, myfunc)
用函数处理你的每个集合。 (无需单独执行每个 samp==1
。)