R：通过 dplyr 进行嵌套分组和总计数？

Question

我正在尝试使用 R 中的 dplyr 包和一个名为 fruit_eaten 像这样：

person,fruit
Alice,apple
Alice,apple
Alice,apple
Alice,orange
Bob,apple
Bob,banana
Bob,grape
Bob,grape
Bob,grape
Cheryl,orange
Cheryl,orange
Cheryl,kiwi
Donald,apple
Donald,apple
Donald,grape
Donald,grape

我想用 dplyr 执行以下操作：

对于每种水果，计算谁吃了最多（所以这不是简单的计数，我想找到最大计数）以及产生多少这个table：

| fruit  | who_ate_most | how_many |
|--------|--------------|----------|
| apple  | Alice        | 3        |
| orange | Cheryl       | 2        |
| banana | Bob          | 1        |
| grape  | Bob          | 3        |
| kiwi   | Cheryl       | 1        |

此外，我不确定如何处理两个或更多人都吃相同的最大水果数量的情况。

同样，我正在尝试制作一个 table 来列出每个人和他们吃得最多的水果以及数量：

| person | ate_most_of | how_many |
|--------|-------------|----------|
| Alice  | apple       | 3        |
| Bob    | grape       | 3        |
| Cheryl | orange      | 2        |
| Donald | apple       | 2        |

当然第二个输出的类似问题table是如果一个人吃了相同的最大数量的不止一个水果怎么办？

我知道 dplyr 中的 group_by() 函数，但看起来我这里有不止一个 "group"。我如何获得 table 中 "how_many" 列的最大计数？

P.S。逗号分隔格式的原始数据 (pastebin link here).

Answer 1

对于每种水果，计算谁吃得最多（这不是简单的计数，而是最大计数）：

df %>%
  count(fruit, person) %>%
  top_n(1)

#    fruit person     n
#   (fctr) (fctr) (int)
# 1  apple  Alice     3
# 2 banana    Bob     1
# 3  grape    Bob     3
# 4   kiwi Cheryl     1
# 5 orange Cheryl     2

df %>%
  count(person, fruit) %>%
  top_n(1)

#   person  fruit     n
#   (fctr) (fctr) (int)
# 1  Alice  apple     3
# 2    Bob  grape     3
# 3 Cheryl orange     2
# 4 Donald  apple     2
# 5 Donald  grape     2

请注意 count 是 tally|summarise|n 的包装器，它为您执行 group_by。请注意底层 group_by 排序的差异。另请注意，每个摘要（基础 n() 摘要）都剥离了一层分组。

根据关于每条记录获得一条记录的评论，我们可以按照@Frank 的建议使用 toString。此外，我们可以通过从其向量中提取first()值来"keep"n：

df %>%
  count(person, fruit) %>%
  top_n(1) %>%
  summarise(     n = first(n),
            fruits = toString(fruit))

#   person     n       fruits
#   (fctr) (int)        (chr)
# 1  Alice     3        apple
# 2    Bob     3        grape
# 3 Cheryl     2       orange
# 4 Donald     2 apple, grape

R：通过 dplyr 进行嵌套分组和总计数？

R: nested groupings and total counts via dplyr?

grouping

r

dplyr