在 R 中循环数据帧删除过程
Looping a dataframe deleting process in R
我想知道如何在下面循环我的代码以使其对其他数据更具功能性和泛化性(当前数据只是一个玩具):
FIRST
,我 select 来自 data
的 study
使用 sample()
然后 filter()
行 outcome == outcome_to_remove
.这给出 datat
输出。
SECOND
,我 select 来自 datat
的 study
使用 sample()
然后 filter()
行 outcome == outcome_to_remove2
.这给出了最终输出。
我们可以循环这个过程吗?
编辑: 我想添加到我的代码中的唯一条件是循环前后的 length(unique(data$study))
应该始终保持不变。也就是说,study
不可能在 FIRST
步中失去它的 outcome == "A"
,而 outcome == "B"
在 SECOND
步中失去它,因此整个研究被删除。
(data <- expand_grid(study = 1:5, group = 1:2, outcome = c("A", "B")))
n = 1
#====-------------------- FIRST:
studies_to_remove = sample(unique(data$study), size = n)
outcome_to_remove = c("A")
datat <- data %>%
filter(
!( study %in% studies_to_remove &
outcome %in% outcome_to_remove
))
#====------------------- SECOND:
studies_to_remove2 = sample(unique(datat$study), size = n)
outcome_to_remove2 = c("B")
datat %>%
filter(
!( study %in% studies_to_remove2 &
outcome %in% outcome_to_remove2
))
借助for
循环-
data <- tidyr::expand_grid(study = 1:5, group = 1:2, outcome = c("A", "B"))
n = 1
set.seed(9873)
outcome_to_remove <- unique(data$outcome)
unique_study <- unique(data$study)
for(i in outcome_to_remove) {
studies_to_remove = sample(unique_study, size = n)
outcome_to_remove = i
unique_study <- setdiff(unique_study, studies_to_remove)
cat('\nDropping study ', studies_to_remove, 'and outcome ', outcome_to_remove)
data <- data %>%
filter(
!( study %in% studies_to_remove &
outcome %in% outcome_to_remove
))
}
#Dropping study 3 and outcome A
#Dropping study 1 and outcome B
data
# study group outcome
# <int> <int> <chr>
# 1 1 1 A
# 2 1 2 A
# 3 2 1 A
# 4 2 1 B
# 5 2 2 A
# 6 2 2 B
# 7 3 1 B
# 8 3 2 B
# 9 4 1 A
#10 4 1 B
#11 4 2 A
#12 4 2 B
#13 5 1 A
#14 5 1 B
#15 5 2 A
#16 5 2 B
我想知道如何在下面循环我的代码以使其对其他数据更具功能性和泛化性(当前数据只是一个玩具):
FIRST
,我 select 来自 data
的 study
使用 sample()
然后 filter()
行 outcome == outcome_to_remove
.这给出 datat
输出。
SECOND
,我 select 来自 datat
的 study
使用 sample()
然后 filter()
行 outcome == outcome_to_remove2
.这给出了最终输出。
我们可以循环这个过程吗?
编辑: 我想添加到我的代码中的唯一条件是循环前后的 length(unique(data$study))
应该始终保持不变。也就是说,study
不可能在 FIRST
步中失去它的 outcome == "A"
,而 outcome == "B"
在 SECOND
步中失去它,因此整个研究被删除。
(data <- expand_grid(study = 1:5, group = 1:2, outcome = c("A", "B")))
n = 1
#====-------------------- FIRST:
studies_to_remove = sample(unique(data$study), size = n)
outcome_to_remove = c("A")
datat <- data %>%
filter(
!( study %in% studies_to_remove &
outcome %in% outcome_to_remove
))
#====------------------- SECOND:
studies_to_remove2 = sample(unique(datat$study), size = n)
outcome_to_remove2 = c("B")
datat %>%
filter(
!( study %in% studies_to_remove2 &
outcome %in% outcome_to_remove2
))
借助for
循环-
data <- tidyr::expand_grid(study = 1:5, group = 1:2, outcome = c("A", "B"))
n = 1
set.seed(9873)
outcome_to_remove <- unique(data$outcome)
unique_study <- unique(data$study)
for(i in outcome_to_remove) {
studies_to_remove = sample(unique_study, size = n)
outcome_to_remove = i
unique_study <- setdiff(unique_study, studies_to_remove)
cat('\nDropping study ', studies_to_remove, 'and outcome ', outcome_to_remove)
data <- data %>%
filter(
!( study %in% studies_to_remove &
outcome %in% outcome_to_remove
))
}
#Dropping study 3 and outcome A
#Dropping study 1 and outcome B
data
# study group outcome
# <int> <int> <chr>
# 1 1 1 A
# 2 1 2 A
# 3 2 1 A
# 4 2 1 B
# 5 2 2 A
# 6 2 2 B
# 7 3 1 B
# 8 3 2 B
# 9 4 1 A
#10 4 1 B
#11 4 2 A
#12 4 2 B
#13 5 1 A
#14 5 1 B
#15 5 2 A
#16 5 2 B