从一组行中随机选择一个值并将值添加到下面的新行

Randomly pick a value from a set of rows and add value to new row below

我的 R 技能不足以解决这个问题,所以我希望有人能提供帮助。

我的数据是这样的:

head(human.players,25)
Season Episode Round Player Player_type Crowd_size q1_a q2_a q3_a q4_a q5_a
2020 1 1 1 1 3 0 1 0 0 NA
2020 1 1 2 1 3 0 1 1 1 NA
2020 1 1 3 1 3 0 0 0 1 NA
2020 1 2 1 1 3 1 1 0 1 NA
2020 1 2 2 1 3 1 0 1 0 NA
2020 1 2 3 1 3 1 1 1 0 NA
2020 1 3 1 1 3 0 1 0 0 NA
2020 1 3 2 1 3 0 1 1 1 NA
2020 1 3 3 1 3 0 0 1 1 NA
2020 1 4 1 1 3 0 0 1 1 NA
2020 1 4 2 1 3 0 0 1 1 NA
2020 1 4 3 1 3 0 0 1 1 NA
2020 1 5 1 1 2 1 1 0 0 NA
2020 1 5 2 1 2 1 1 1 0 NA
2020 1 5 3 1 2 NA NA NA NA NA
2020 1 6 1 1 2 0 0 0 0 NA
2020 1 6 2 1 2 0 0 0 0 NA
2020 1 6 3 1 2 NA NA NA NA NA
2020 1 7 1 1 2 0 1 1 1 NA
2020 1 7 2 1 2 1 0 0 1 NA
2020 1 7 3 1 2 NA NA NA NA NA
2020 2 1 1 1 3 1 1 0 0 NA
2020 2 1 2 1 3 0 0 0 1 NA
2020 2 1 3 1 3 0 1 1 0 NA

来自 q1_a:q5_a 的变量表示玩家是答错了 (0) 还是答对了 (1)。每个玩家都在特定的回合中玩(每集有 7 轮)。在前4轮中,有3名球员。然而,在第 5-7 轮中,只有 2 名玩家(被淘汰的一名玩家有 NA - 例如,在第 1 集中,这是玩家 3 - 见上文 table)。

我需要创建一个随机播放器。这意味着在前 4 轮中,我需要随机 select 从该轮中的三名玩家中选择一个答案(针对 5 个问题中的每一个),并添加“随机玩家”行值。对于第 5 到 7 轮,我需要 select 两个玩家的答案(忽略 NA)并添加“随机玩家”行值。

排序算法必须查看第 1 轮(仅那些行),从三行中抽取一个值,将其粘贴到第 1 轮(即,在本例中创建第 4 行)并为每个执行此操作5 个问题中。然后是第 2 轮...

这就是我添加玩家 4(随机玩家)的地方的样子:

Season Episode Round Player Player_type Crowd_size q1_a q2_a q3_a q4_a q5_a
2020 1 1 1 1 3 0 1 0 0 NA
2020 1 1 2 1 3 0 1 1 1 NA
2020 1 1 3 1 3 0 0 0 1 NA
2020 1 1 4 1 3 0 0 1 1 NA
2020 1 2 1 1 3 1 1 0 1 NA
2020 1 2 2 1 3 1 0 1 0 NA
2020 1 2 3 1 3 1 1 1 0 NA
2020 1 2 4 1 3 1 1 1 0 NA
2020 1 3 1 1 3 0 1 0 0 NA
2020 1 3 2 1 3 0 1 1 1 NA
2020 1 3 3 1 3 0 0 1 1 NA
2020 1 3 4 1 3 0 0 0 1 NA
2020 1 4 1 1 3 0 0 1 1 NA
2020 1 4 2 1 3 0 0 1 1 NA
2020 1 4 3 1 3 0 0 1 1 NA
2020 1 4 4 1 3 0 0 1 1 NA

写这篇文章时,我认为这可能是不可能的,或者至少很难做到,所以这个问题更像是一个“万岁玛丽”。我假设 sample()、apply() 的某种组合,并且创建自定义函数是必要的,但我很困惑。

这里有一个管道将新玩家及其分数采样到一个单独的帧中,然后您可以 bind_rows 返回原始数据。

set.seed(2021)
newplayers <- dat %>%
  filter(!is.na(q1_a)) %>%
  group_by(Season, Episode, Round) %>%
  summarize(across(everything(), ~ sample(., size=1)), .groups = "drop") %>%
  mutate(Player = NA_integer_, Player_type = NA_integer_)
newplayers
# # A tibble: 8 x 11
#   Season Episode Round Player Player_type Crowd_size  q1_a  q2_a  q3_a  q4_a q5_a 
#    <int>   <int> <int>  <int>       <int>      <int> <int> <int> <int> <int> <lgl>
# 1   2020       1     1     NA          NA          3     0     0     1     1 NA   
# 2   2020       1     2     NA          NA          3     1     1     0     0 NA   
# 3   2020       1     3     NA          NA          3     0     1     1     0 NA   
# 4   2020       1     4     NA          NA          3     0     0     1     1 NA   
# 5   2020       1     5     NA          NA          2     1     1     1     0 NA   
# 6   2020       1     6     NA          NA          2     0     0     0     0 NA   
# 7   2020       1     7     NA          NA          2     0     0     1     1 NA   
# 8   2020       2     1     NA          NA          3     0     1     0     0 NA   

bind_rows(dat, newplayers) %>%
  arrange(Season, Episode, Round, is.na(Player), Player) %>%
  head(.)
#   Season Episode Round Player Player_type Crowd_size q1_a q2_a q3_a q4_a q5_a
# 1   2020       1     1      1           1          3    0    1    0    0   NA
# 2   2020       1     1      2           1          3    0    1    1    1   NA
# 3   2020       1     1      3           1          3    0    0    0    1   NA
# 4   2020       1     1     NA          NA          3    0    0    1    1   NA
# 5   2020       1     2      1           1          3    1    1    0    1   NA
# 6   2020       1     2      2           1          3    1    0    1    0   NA

我不知道要给 Player* 赋什么值,所以我选择了 NA


数据

# dput(dat)
dat <- structure(list(Season = c(2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L, 2020L), Episode = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), Round = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 1L, 1L, 1L), Player = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Player_type = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Crowd_size = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), q1_a = c(0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, NA, 0L, 0L, NA, 0L, 1L, NA, 1L, 0L, 0L), q2_a = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, NA, 0L, 0L, NA, 1L, 0L, NA, 1L, 0L, 1L), q3_a = c(0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, NA, 0L, 0L, NA, 1L, 0L, NA, 0L, 0L, 1L), q4_a = c(0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, NA, 0L, 0L, NA, 1L, 1L, NA, 0L, 1L, 0L), q5_a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -24L))