Select 组中有一个以上的变量

Question

简单的问题。假设我有一个如下所示的数据框：

data.frame (species=c(a,a,b,c,c,d),dbh=c(5,4,7,1,3,6))

我想排除物种 b 和 d，因为它们只出现一次，我该怎么做？

Answer 1

这可以使用 base R 或其他包来完成。使用data.table，我们将'data.frame'转换为data.table（setDT(df1)），按'species'分组，if行数大于1 (.N>1) ，我们得到 Data.table (.SD)

的子集

 library(data.table)
 setDT(df1)[, if(.N>1) .SD, species]

或者用dplyr，分组后我们用filter。

 library(dplyr)
 df1 %>%
     group_by(species) %>%
     filter(n()>1)

也可以使用base R函数ave。我们按 'species' 分组，得到 length，转换为逻辑向量和 subset 数据集。

 df1[with(df1, ave(dbh, species, FUN=length)>1),]

或者我们可以用table得到'species'中元素出现的频率。找到计数大于 1 的元素的 names，使用 %in% 得到一个逻辑向量，然后像以前一样进行子集。

 tbl <- table(df1$species)>1
 df1[df1$species %in% names(tbl)[tbl],]

Select groups which have more than one variable in them