有没有办法通过在示例数据上使用 dplyr 来显示 "zero-counts"？

Question

女士们先生们大家好，我在总结我的数据样本时遇到了问题，同时又想查看我尝试的方法产生的 "zero-counts"。我的数据如下所示：

library(dplyr)
set.seed(529)
sampledata <- data.frame(StartPos = rep(1:10, times = 10),
              Velocity = c(sample(c(-36, 36), 100, replace = T)),
              Response = c(sample(c("H", "M", "W"), 50, replace=T),
                           sample(c("M", "W"), 50, replace = T)))

数据由 100 行组成，起始位置范围为 1-10（每个随机生成 10 次（大约 20 次，如起始位置 3 可能存在 20 次））。每个起始位置也有一个响应，可以是 H 表示命中，M 表示未命中或 W 表示错误。某些 StartPositions 可能没有 H。还有一个名为 Velocity 的列，其值为 -36 和 36，描述了从特定 StartPos 开始的 Stimlus 的方向（-36 向右，36 向左）。

这里我唯一真正关心的是 StartPos 和 Velocitys with Hits - 用于随后的百分比计算。

为了计算每边运行的测试次数，我创建了以下 filter/counter：

numbofrunsperside <- sampledata %>%
  mutate(Direction = case_when( # add direction
    Velocity < 0 ~ "Right",
    Velocity > 0 ~ "Left",
    TRUE ~ "None")) %>%
  group_by(StartPos, Direction) %>% # for each combination
  count(Velocity, .drop=FALSE) # count
numbofrunsperside

对于具有各自 StartPos 和方向的 Hit-Counts (Left/Right)：

sampledata_hit_counts <- sampledata %>%
  mutate(Direction = case_when( # add direction 
    Velocity < 0 ~ "Right",
    Velocity > 0 ~ "Left",
    TRUE ~ "None")) %>% 
  filter(Response == "H") %>% 
  group_by(StartPos, Direction, .drop=FALSE) %>% # for each combination 
  count(StartPos, .drop=FALSE) # count
sampledata_hit_counts

问题出现在这里：每个侧数据帧的运行数量有 20 行，而 sampledata_hit_counts 只有 12.

当我尝试使用以下方法计算命中率时，收到以下错误消息：

sampledata_hit_counts$PTest = sampledata_hit_counts$n / 
numbofrunsperside$n

$<-.data.frame(*tmp*, PTest, value = c(0.2, 0.2, 0.25, 0.166666666666667, : 替换有 20 行，数据有 12 另外：警告信息：在 sampledata_hit_counts$n/numbofrunsperside$n 中：较长的对象长度不是较短对象长度的倍数

解决此问题的一种方法是在 sampledata_hit_counts 中包含不同方向的 "zero-counts" 和 startpos - 这样每个 df 中的行数将相同。遗憾的是，我不知道如何做到这一点...将不胜感激！

Answer 1

您可以进行左连接：

library(dplyr)

numbofrunsperside %>%
    left_join(
        sampledata_hit_counts, 
        by = c("StartPos", "Direction"), 
        suffix = c("_runs", "_hits")
    ) %>% 
    mutate(
        p_test = ifelse(is.na(n_hits), 0, n_hits) / n_runs
    ) %>% 
    pull(p_test)
#[1] 0.2000000 0.0000000 0.0000000 0.1666667 0.0000000 0.0000000 0.3333333 0.1428571 0.0000000 0.1250000 0.1666667 0.5000000 0.2000000
#[14] 0.4000000 0.1666667 0.0000000 0.0000000 0.3333333 0.5000000 0.0000000

有没有办法通过在示例数据上使用 dplyr 来显示 "zero-counts"？

Is there a way to show the "zero-counts" by using dplyr on sample data?

r

sample

count

filter

dplyr