在r中按组提取不相等的行数
Extracting unequal number of rows by group in r
我有一个按形状分组的两个变量数据框,我想为每个组提取前 n 行(分组变量的每个级别的 n 都不同)。我尝试了一些 dplyr 和 data.table 函数,但它们似乎只适用于相同数量的行。
Data <- data.frame(Shape = c("R", "R", "R", "C", "C", "T", "T", "T", "T"), Area = c(35, 30, 25, 32, 28, 40, 35, 33, 31))
我想获得前2个R,第一个C和前3个T。预期结果:
Out <- data.frame(Shape = c("R", "R", "C", "T", "T", "T"), Area = c(35, 30, 32, 40, 35, 33))
我们可以用 'Shape' 列将 group_split
转换成 data.frame
的 list
,然后在 [=15= 中传递限制 'n' ] 到 filter
相应的行数
library(dplyr)
library(purrr)
Data %>%
group_split(Shape = factor(Shape, levels = unique(Shape))) %>%
map2_dfr(., c(2, 1, 3), ~ .x %>%
filter(row_number() <= .y))
# A tibble: 6 x 2
# Shape Area
#* <fct> <dbl>
#1 R 35
#2 R 30
#3 C 32
#4 T 40
#5 T 35
#6 T 33
或者另一种选择是通过传递一个命名向量来获得一列 'n',然后按 'Shape' 分组,执行 filter
Data %>%
mutate(n = setNames(c(2, 1, 3), unique(Shape))[as.character(Shape)]) %>%
group_by(Shape) %>%
filter(row_number() <= n[1]) %>%
select(-n)
比 akrun 的版本复杂一点,但可能更容易阅读:
library(tidyverse)
numberRows <- tibble(Shape = c("R", "C", "T")
, firstRows = c(2,1,3))
Data %>%
left_join(numberRows, "Shape") %>%
group_by(Shape) %>%
slice(1:(mean(firstRows)))
我有一个按形状分组的两个变量数据框,我想为每个组提取前 n 行(分组变量的每个级别的 n 都不同)。我尝试了一些 dplyr 和 data.table 函数,但它们似乎只适用于相同数量的行。
Data <- data.frame(Shape = c("R", "R", "R", "C", "C", "T", "T", "T", "T"), Area = c(35, 30, 25, 32, 28, 40, 35, 33, 31))
我想获得前2个R,第一个C和前3个T。预期结果:
Out <- data.frame(Shape = c("R", "R", "C", "T", "T", "T"), Area = c(35, 30, 32, 40, 35, 33))
我们可以用 'Shape' 列将 group_split
转换成 data.frame
的 list
,然后在 [=15= 中传递限制 'n' ] 到 filter
相应的行数
library(dplyr)
library(purrr)
Data %>%
group_split(Shape = factor(Shape, levels = unique(Shape))) %>%
map2_dfr(., c(2, 1, 3), ~ .x %>%
filter(row_number() <= .y))
# A tibble: 6 x 2
# Shape Area
#* <fct> <dbl>
#1 R 35
#2 R 30
#3 C 32
#4 T 40
#5 T 35
#6 T 33
或者另一种选择是通过传递一个命名向量来获得一列 'n',然后按 'Shape' 分组,执行 filter
Data %>%
mutate(n = setNames(c(2, 1, 3), unique(Shape))[as.character(Shape)]) %>%
group_by(Shape) %>%
filter(row_number() <= n[1]) %>%
select(-n)
比 akrun 的版本复杂一点,但可能更容易阅读:
library(tidyverse)
numberRows <- tibble(Shape = c("R", "C", "T")
, firstRows = c(2,1,3))
Data %>%
left_join(numberRows, "Shape") %>%
group_by(Shape) %>%
slice(1:(mean(firstRows)))