删除重复的 id 和条件的子集

Question

如果这是我的数据集

Id   Weight   Category
1    10.2     Pre
1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
4    12.3     Pre
5    11.8     Pre

如何删除同样为 Category=Pre 的重复 ID。我最终的预期数据集是

Id   Weight   Category

1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
5    11.8     Pre

Answer 1

您可以整理数据，然后使用distinct。

library(dplyr)

df %>% arrange(Id, Category) %>% distinct(Id, .keep_all = TRUE)

#  Id Weight Category
#1  1   12.1     Post
#2  2   11.3     Post
#3  3   12.9      Pre
#4  4   10.3     Post
#5  5   11.8      Pre

之所以有效，是因为 'Pre' > 'Post'.

Answer 2

使用 by，将 dat 拆分为 Id 和 select Post，然后 rbind 结果。

do.call(rbind, by(dat, dat$Id, function(x) 
  if (nrow(x) == 2)  x[x$Category == 'Post', ] else x))
#   Id Weight Category
# 1  1   12.1     Post
# 2  2   11.3     Post
# 3  3   12.9      Pre
# 4  4   10.3     Post
# 5  5   11.8      Pre

数据：

dat <- read.table(header=T, text='
                  Id   Weight   Category
1    10.2     Pre
1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
4    12.3     Pre
5    11.8     Pre
                  ')

Answer 3

我们可以在使用 first() 分组和排列后使用 filter，因为 Post 在 Pre 之前：

df %>% 
  group_by(Id) %>% 
  arrange(Id, Category) %>% 
  filter(Category ==first(Category))

输出：

     Id Weight Category
  <int>  <dbl> <chr>   
1     1   12.1 Post    
2     2   11.3 Post    
3     3   12.9 Pre     
4     4   10.3 Post    
5     5   11.8 Pre

Answer 4

使用 base R

中的 subset

subset(df[with(df, order(Id, Category == 'Pre')),], !duplicated(Id))
  Id Weight Category
2  1   12.1     Post
3  2   11.3     Post
4  3   12.9      Pre
5  4   10.3     Post
7  5   11.8      Pre

数据

df <- structure(list(Id = c(1L, 1L, 2L, 3L, 4L, 4L, 5L), Weight = c(10.2, 
12.1, 11.3, 12.9, 10.3, 12.3, 11.8), Category = c("Pre", "Post", 
"Post", "Pre", "Post", "Pre", "Pre")), class = "data.frame", 
row.names = c(NA, 
-7L))

删除重复的 id 和条件的子集

Subset to remove duplicate id and condition

r

subset

duplicates

dataframe

数据