数据集子集中的R dplyr slice min
R dplyr slice min within a subset of dataset
如果这是我的数据集
Id Group1 Col1 Col2
1 Red 2/1/1999 3
1 Red 4/11/1998 4
2 Black 7/8/1995 NA
2 Black 11/2/2000 1
3 Black 11/1/2994 2
3 Black 5/18/1997 6
我如何实施 slice_min
或 filter(Col2= min(Col2)|is.na(Col2))
以仅在 Group1=Red 的情况下仅保留具有最小 Col2 的行?
预期输出
Id Group1 Col1 Col2
1 Red 2/1/1999 3
2 Black 7/8/1995 NA
2 Black 11/2/2000 1
3 Black 11/1/2994 2
3 Black 5/18/1997 6
提前感谢您的任何建议。
您可以只在您想要的子集上使用 slice_min
,而 rbind
结果用于 df 的其余部分
rbind(
df %>% filter(Group1 == "Red") %>% slice_min(Col2),
df %>% filter(Group1 != "Red")
)
输出:
Id Group1 Col1 Col2
<int> <chr> <chr> <int>
1 1 Red 2/1/1999 3
2 2 Black 7/8/1995 NA
3 2 Black 11/2/2000 1
4 3 Black 11/1/2994 2
5 3 Black 5/18/1997 6
您可以使用稍微复杂的 filter
条件:
library(dplyr)
df %>%
filter((Group1 == "Red" & (Col2 == min(.[.$Group1 == "Red",]$Col2) | is.na(Col2))) | Group1 != "Red")
这个returns
# A tibble: 5 x 4
Id Group1 Col1 Col2
<dbl> <chr> <chr> <dbl>
1 1 Red 2/1/1999 3
2 2 Black 7/8/1995 NA
3 2 Black 11/2/2000 1
4 3 Black 11/1/2994 2
5 3 Black 5/18/1997 6
或者 Ritchie Sacramento 提出的更优雅的方式:
df %>%
group_by(Group1) %>%
filter(Col2 == min(Col2, na.rm = TRUE) | is.na(Col2) | Group1 != "Red")
数据
structure(list(Id = c(1, 1, 2, 2, 3, 3), Group1 = c("Red", "Red",
"Black", "Black", "Black", "Black"), Col1 = c("2/1/1999", "4/11/1998",
"7/8/1995", "11/2/2000", "11/1/2994", "5/18/1997"), Col2 = c(3,
4, NA, 1, 2, 6)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
Id = structure(list(), class = c("collector_double", "collector"
)), Group1 = structure(list(), class = c("collector_character",
"collector")), Col1 = structure(list(), class = c("collector_character",
"collector")), Col2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
如果这是我的数据集
Id Group1 Col1 Col2
1 Red 2/1/1999 3
1 Red 4/11/1998 4
2 Black 7/8/1995 NA
2 Black 11/2/2000 1
3 Black 11/1/2994 2
3 Black 5/18/1997 6
我如何实施 slice_min
或 filter(Col2= min(Col2)|is.na(Col2))
以仅在 Group1=Red 的情况下仅保留具有最小 Col2 的行?
预期输出
Id Group1 Col1 Col2
1 Red 2/1/1999 3
2 Black 7/8/1995 NA
2 Black 11/2/2000 1
3 Black 11/1/2994 2
3 Black 5/18/1997 6
提前感谢您的任何建议。
您可以只在您想要的子集上使用 slice_min
,而 rbind
结果用于 df 的其余部分
rbind(
df %>% filter(Group1 == "Red") %>% slice_min(Col2),
df %>% filter(Group1 != "Red")
)
输出:
Id Group1 Col1 Col2
<int> <chr> <chr> <int>
1 1 Red 2/1/1999 3
2 2 Black 7/8/1995 NA
3 2 Black 11/2/2000 1
4 3 Black 11/1/2994 2
5 3 Black 5/18/1997 6
您可以使用稍微复杂的 filter
条件:
library(dplyr)
df %>%
filter((Group1 == "Red" & (Col2 == min(.[.$Group1 == "Red",]$Col2) | is.na(Col2))) | Group1 != "Red")
这个returns
# A tibble: 5 x 4
Id Group1 Col1 Col2
<dbl> <chr> <chr> <dbl>
1 1 Red 2/1/1999 3
2 2 Black 7/8/1995 NA
3 2 Black 11/2/2000 1
4 3 Black 11/1/2994 2
5 3 Black 5/18/1997 6
或者 Ritchie Sacramento 提出的更优雅的方式:
df %>%
group_by(Group1) %>%
filter(Col2 == min(Col2, na.rm = TRUE) | is.na(Col2) | Group1 != "Red")
数据
structure(list(Id = c(1, 1, 2, 2, 3, 3), Group1 = c("Red", "Red",
"Black", "Black", "Black", "Black"), Col1 = c("2/1/1999", "4/11/1998",
"7/8/1995", "11/2/2000", "11/1/2994", "5/18/1997"), Col2 = c(3,
4, NA, 1, 2, 6)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -6L), spec = structure(list(cols = list(
Id = structure(list(), class = c("collector_double", "collector"
)), Group1 = structure(list(), class = c("collector_character",
"collector")), Col1 = structure(list(), class = c("collector_character",
"collector")), Col2 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))