如何在 for if 循环中以因子为条件对离散分布进行采样

How to sample discrete distribution conditional on factor within for if loop

我正在尝试通过从特定的离散分布中采样来生成虚拟数据 - 以因子水平为条件(因此每个因子水平的分布不同)然后希望将每个随机结果插入到新的数据框中行中对应因子水平的列。如果您 运行 下面的代码,您会看到 'data$last' 是空的。我不确定我做错了什么,我也尝试过不使用循环,将每个级别的复制设置为 100 - 但是分布不正确。

#Create data frame with factor 
set.seed(1)
ID<-(1:200)
gender<-sample(x = c("Male","Female"), 200, replace = T, prob = c(0.5, 0.5))
data<-data.frame(ID,gender)

#Generate random response based on discrete distribution conditional on gender
data$last <- for (i in 1:nrow(data)) {if(data$gender=="Male") {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
}

您应该重写您的 for 循环以在循环内分配每个 data$last 值:

for (i in 1:nrow(data)) {
  if(data$gender[i]=="Male") {
    data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
  } else {
    data$last[i] = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
  }
}

或者没有for循环:

data$last = ifelse(data$gender=="Male", 
               sample(x = c("Today","Yesterday"), length(data$gender[(data$gender=="Male")==TRUE]), replace = T, prob = c(0.8, 0.2)), 
               sample(x = c("Today","Yesterday"), length(data$gender[(data$gender!="Male")==TRUE]), replace = T, prob = c(0.3, 0.7)))
#Generate random response based on discrete distribution conditional on gender
data$last <- sapply(1:nrow(data),function(i){if(data$gender[i]=="Male") {
  s =sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.8, 0.2))
} else {
  s = sample(x = c("Today","Yesterday"), 1, replace = T, prob = c(0.3, 0.7))
}
  return(s)
})

检查一下您如何没有寻找特定的 data$gender 而是寻找整个向量。此外,return 使用 return(s)

的结果