数据集子集中的R dplyr slice min

Question

如果这是我的数据集

Id    Group1   Col1       Col2
1     Red      2/1/1999   3
1     Red      4/11/1998  4
2     Black    7/8/1995   NA
2     Black    11/2/2000  1
3     Black    11/1/2994  2
3     Black    5/18/1997  6

我如何实施 slice_min 或 filter(Col2= min(Col2)|is.na(Col2)) 以仅在 Group1=Red 的情况下仅保留具有最小 Col2 的行？

预期输出

Id    Group1   Col1       Col2
1     Red      2/1/1999   3

2     Black    7/8/1995   NA
2     Black    11/2/2000  1
3     Black    11/1/2994  2
3     Black    5/18/1997  6

提前感谢您的任何建议。

Answer 1

您可以只在您想要的子集上使用 slice_min，而 rbind 结果用于 df 的其余部分

rbind(
  df %>% filter(Group1 == "Red") %>% slice_min(Col2),
  df %>% filter(Group1 != "Red")
)

输出：

     Id Group1 Col1       Col2
  <int> <chr>  <chr>     <int>
1     1 Red    2/1/1999      3
2     2 Black  7/8/1995     NA
3     2 Black  11/2/2000     1
4     3 Black  11/1/2994     2
5     3 Black  5/18/1997     6

Answer 2

您可以使用稍微复杂的 filter 条件：

library(dplyr)

df %>% 
  filter((Group1 == "Red" & (Col2 == min(.[.$Group1 == "Red",]$Col2) | is.na(Col2))) | Group1 != "Red")

这个returns

# A tibble: 5 x 4
     Id Group1 Col1       Col2
  <dbl> <chr>  <chr>     <dbl>
1     1 Red    2/1/1999      3
2     2 Black  7/8/1995     NA
3     2 Black  11/2/2000     1
4     3 Black  11/1/2994     2
5     3 Black  5/18/1997     6

或者 Ritchie Sacramento 提出的更优雅的方式：

df %>%  
  group_by(Group1) %>%  
  filter(Col2 == min(Col2, na.rm = TRUE) | is.na(Col2) | Group1 != "Red")

数据

structure(list(Id = c(1, 1, 2, 2, 3, 3), Group1 = c("Red", "Red", 
"Black", "Black", "Black", "Black"), Col1 = c("2/1/1999", "4/11/1998", 
"7/8/1995", "11/2/2000", "11/1/2994", "5/18/1997"), Col2 = c(3, 
4, NA, 1, 2, 6)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
    Id = structure(list(), class = c("collector_double", "collector"
    )), Group1 = structure(list(), class = c("collector_character", 
    "collector")), Col1 = structure(list(), class = c("collector_character", 
    "collector")), Col2 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

数据集子集中的R dplyr slice min

R dplyr slice min within a subset of dataset

r

dplyr

tidyverse

数据