使用dplyr时如何保留其他列？

Question

我遇到了与 How to aggregate some columns while keeping other columns in R? 所述类似的问题，但我尝试过的 none 解决方案有效。

我有这样一个数据框：

df<-data.frame(a=rep(c("a","b"),each=2),b=c(500,400,200,300), 
               c = c(5,10,2,4),stringsAsFactors = FALSE) 
> df
  a   b  c
1 a 500  5
2 a 400 10
3 b 200  2
4 b 300  4

df%>%
  group_by(a)%>%
  summarise('max' = max(c), 'sum'=sum(c))

  a       max   sum
  <chr> <dbl> <dbl>
1 a        10    15  
2 b         4     6

but I need also column b:

1 a        10    15   400
2 b         4     6   300

b 列的值为 max(c)。

为特定案例编辑数据：

> df
  a   b  c
1 a 500  5
2 a 400  5

在这种情况下，我需要在摘要中使用更高值的 col b

#   a       max   sum     b
#   <chr> <dbl> <dbl> <dbl>
# 1 a         5    10   500

Answer 1

随着问题的编辑而更新

df%>%
  group_by(a)%>%
  summarise('max' = max(c), 'sum'=sum(c), b=max(b))

# A tibble: 2 x 4
#   a       max   sum     b
#  <chr>  <dbl>  <dbl> <dbl>
# 1 a        10    15   500
# 2 b         4     6   300

Answer 2

我会将 summarise 替换为 mutate（保留所有行），然后筛选出您想要的行。然后 tibble 仍然被分组，因此需要 ungroup 来摆脱分组。

d f%>%
    group_by(a) %>%
    mutate('max' = max(c), 'sum'=sum(c)) %>% 
    filter(c == max) %>%
    ungroup()

#   a         b     c   max   sum
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1 a       400    10    10    15
# 2 b       300     4     4     6

Answer 3

您必须指定如何汇总变量 b:

df %>%
  group_by(a) %>%
  summarise(max = max(c), sum = sum(c), b = max(b[c == max(c)]))

# # A tibble: 2 x 4
#   a       max   sum     b
#   <chr> <dbl> <dbl> <dbl>
# 1 a        10    15   400
# 2 b         4     6   300

使用dplyr时如何保留其他列？

How to keep other columns when using dplyr?

group-by

r

dplyr

summarize