在 R 中过滤 data.frame - 想要一个变量的最大级别用于其他两个变量的每个组合

Question

我正在尝试过滤我的数据框，为其他两个变量（人和测试）的每个组合保留一个变量（问题）的最大值。

我的 data.frame 看起来像这样：

df <- data.frame(Person=c("Person1","Person1","Person1","Person2","Person2","Person2","Person3","Person3","Person3"),
             Test=c(rep("Test1",9)),
             Question=c("1","2","3","1","2","3","1","2","3"))

除了有多个测试，即 Test2、Test3 等

我想向下过滤以显示每个人的每个测试中的最后一个问题。每个测试中的问题数量不同。

使用对这个问题的回答： dplyr filter: Get rows with minimum of variable, but only the first if multiple minima，我设法通过以下方式获得了一些帮助：

library(dplyr)
df.grouped <- group_by(df.orginial, Person, Test)
df.lastquestion <- filter(df.grouped, Question == max(Question))

不幸的是，它给我留下了每个人在所有测试中回答的最高问题编号。然而，我想要每个人在每次测试中回答的最高问题编号。

谢谢

Answer 1

同时会有大量 dplyr、plyr 和 data.table 提供的选项，这是一个很好的老式 base-R 版本，使用示例数据的一些扩展（和大大简化）版本

df <- data.frame(Person = rep(paste0("Person", 1:3), each = 3, times = 2),
                 Test = rep(paste0("Test", 1:4), each = 9),
                 Question = as.character(rep(1:3, times = 3 * 2)))

您可以内联执行此操作，但显式包装器使我可以专注于此问题的两个方面

wrapper <- function(x) {
  with(x, x[Question == max(Question), ])
}

您可以在此处使用 which.max(Question)，但如果 Question 中有多个值，则 select 最大值的 第一个 与最大值相同的值。

现在我们要拆分数据然后应用wrapper()到每个元素。上面提到的其他包提供了更一致的，在某些情况下更快的实现，但 base-R 通常具有竞争力：

ll <- lapply(with(df, split(df, list(Person, Test))), wrapper)

现在只需将所有内容绑定在一起：

newdf <- do.call("rbind", c(ll, make.row.names = FALSE))
head(newdf)

哪个returns:

> head(newdf)
    Person  Test Question
1  Person1 Test1        3
2  Person2 Test1        3
3  Person3 Test1        3
4  Person1 Test2        3
5  Person2 Test2        3
6  Person3 Test2        3

整个事情将是：

wrapper <- function(x) {
  with(x, x[Question == max(Question), ])
}
ll <- lapply(with(df, split(df, list(Person, Test))), wrapper)
newdf <- do.call("rbind", c(ll, make.row.names = FALSE))

Answer 2

使用平均值：

df[df$Question == ave(as.numeric(df$Question),list(df$Person,df$Test),FUN = max), ]

在 R 中过滤 data.frame - 想要一个变量的最大级别用于其他两个变量的每个组合

Filtering a data.frame in R - want the max level of one variable for each combination of two others

row

r

dataframe