子集值更改的数据框

Subsetting dataframe where values change

我想根据值更改的位置过滤 data.frame 中的行。 假设我有:

    id  name  quarter      score      
1.  01  john  q1 2020      80
2.  01  john  q2 2020      80
3.  01  john  q3 2020      85
4.  01  john  q4 2020      75
5.  02  adam  q1 2020      80
6.  02  adam  q2 2020      80
7.  02  adam  q3 2020      85
8.  03  lana  q1 2020      50

无论季度和分数都发生变化,我想过滤掉那些行。所以上面的数据框应该变成,

    id  name  quarter      score      
1.  01  john  q2 2020      80
2.  01  john  q3 2020      85
3.  01  john  q4 2020      75
4.  02  adam  q2 2020      80
5.  02  adam  q3 2020      85

如何将单元格的值与 R 中上一行的值进行比较?

您可以使用 dplyr 中的 lead 函数:

library(dplyr)

result <- df %>% filter(quarter != lead(quarter), score != lead(score))
result
#   id name quarter score
#2.  1 john  q22020    80
#3.  1 john  q32020    85
#4.  1 john  q42020    75
#6.  2 adam  q22020    80
#7.  2 adam  q32020    85

data.table 类似:

library(data.table)
setDT(df)[quarter != shift(quarter, type = 'lead') & 
          score != lead(score, type = 'lead')]

也许你可以像下面那样尝试 subset + ave

subset(
  df,
  !!ave(score,id,FUN = function(x) c(TRUE,diff(x)!=0)&length(x)>1)
)

这给出了

  id name quarter score
1  1 john q1_2020    80
3  1 john q3_2020    85
4  1 john q4_2020    75
5  2 adam q1_2020    80
7  2 adam q3_2020    85

数据

> dput(df)
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L), name = c("john", 
"john", "john", "john", "adam", "adam", "adam", "lana"), quarter = c("q1_2020", 
"q2_2020", "q3_2020", "q4_2020", "q1_2020", "q2_2020", "q3_2020", 
"q1_2020"), score = c(80L, 80L, 85L, 75L, 80L, 80L, 85L, 50L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8"))