用 dplyr 计算频率

Question

我有一个包含两列 id_1 和 id_2 的数据框。对于每个 id_1，我想计算它与 id_2.

的所有元素匹配的次数

我想象结果是一个包含列 id_1、id_2 和 number_of_id_2_found_for_id_1.

的数据框

这就是我正在尝试的

set.seed(1)
df <- data.frame(
  id_1 = sample(1:10, size = 30, replace = TRUE),
  id_2 = sample(1:10, size = 30, replace = TRUE)
)

df %>% group_by(id_1, id_2) %>%
  # id_1 should be unique
  summarise(~n(.x)) # I want this to be the number of id_2 it has found for each of the elements of id_1

我的预期输出是：

1 1 0
1 2 0
1 3 0
1 4 1
1 5 0
....
1 9 0
2 1 0
...
2 7 1
2 8 0
2 9 1

等等，基本上每个id_1它为每个_id_2找到的元素数量。在上面的示例中，它主要是 1，但在更大的数据框中，计数会增加。这就像一个二分图，其中边是第一个组件 - id_1 和 id_2.

之间从左到右匹配的数量

提前致谢！

Answer 1

基于更新后的 post，可能我们需要对所有组合进行 crossing 到 return，在原始数据集上对两者进行 count列并加入完整组合

library(dplyr)
library(tidyr)
crossing(id_1 = 1:10, id_2 = 1:10)  %>% 
  left_join(., df %>% 
                  count(id_1, id_2)) %>%
  mutate(n = replace_na(n, 0))

-输出

# A tibble: 100 x 3
#    id_1  id_2     n
#   <int> <int> <dbl>
# 1     1     1     0
# 2     1     2     0
# 3     1     3     1
# 4     1     4     1
# 5     1     5     0
# 6     1     6     0
# 7     1     7     0
# 8     1     8     0
# 9     1     9     1
#10     1    10     0
# … with 90 more rows

用 dplyr 计算频率

Counting frequencies with dplyr

grouping

r

dplyr