子集值更改的数据框
Subsetting dataframe where values change
我想根据值更改的位置过滤 data.frame
中的行。
假设我有:
id name quarter score
1. 01 john q1 2020 80
2. 01 john q2 2020 80
3. 01 john q3 2020 85
4. 01 john q4 2020 75
5. 02 adam q1 2020 80
6. 02 adam q2 2020 80
7. 02 adam q3 2020 85
8. 03 lana q1 2020 50
无论季度和分数都发生变化,我想过滤掉那些行。所以上面的数据框应该变成,
id name quarter score
1. 01 john q2 2020 80
2. 01 john q3 2020 85
3. 01 john q4 2020 75
4. 02 adam q2 2020 80
5. 02 adam q3 2020 85
如何将单元格的值与 R 中上一行的值进行比较?
您可以使用 dplyr
中的 lead
函数:
library(dplyr)
result <- df %>% filter(quarter != lead(quarter), score != lead(score))
result
# id name quarter score
#2. 1 john q22020 80
#3. 1 john q32020 85
#4. 1 john q42020 75
#6. 2 adam q22020 80
#7. 2 adam q32020 85
与 data.table
类似:
library(data.table)
setDT(df)[quarter != shift(quarter, type = 'lead') &
score != lead(score, type = 'lead')]
也许你可以像下面那样尝试 subset
+ ave
subset(
df,
!!ave(score,id,FUN = function(x) c(TRUE,diff(x)!=0)&length(x)>1)
)
这给出了
id name quarter score
1 1 john q1_2020 80
3 1 john q3_2020 85
4 1 john q4_2020 75
5 2 adam q1_2020 80
7 2 adam q3_2020 85
数据
> dput(df)
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L), name = c("john",
"john", "john", "john", "adam", "adam", "adam", "lana"), quarter = c("q1_2020",
"q2_2020", "q3_2020", "q4_2020", "q1_2020", "q2_2020", "q3_2020",
"q1_2020"), score = c(80L, 80L, 85L, 75L, 80L, 80L, 85L, 50L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
我想根据值更改的位置过滤 data.frame
中的行。
假设我有:
id name quarter score
1. 01 john q1 2020 80
2. 01 john q2 2020 80
3. 01 john q3 2020 85
4. 01 john q4 2020 75
5. 02 adam q1 2020 80
6. 02 adam q2 2020 80
7. 02 adam q3 2020 85
8. 03 lana q1 2020 50
无论季度和分数都发生变化,我想过滤掉那些行。所以上面的数据框应该变成,
id name quarter score
1. 01 john q2 2020 80
2. 01 john q3 2020 85
3. 01 john q4 2020 75
4. 02 adam q2 2020 80
5. 02 adam q3 2020 85
如何将单元格的值与 R 中上一行的值进行比较?
您可以使用 dplyr
中的 lead
函数:
library(dplyr)
result <- df %>% filter(quarter != lead(quarter), score != lead(score))
result
# id name quarter score
#2. 1 john q22020 80
#3. 1 john q32020 85
#4. 1 john q42020 75
#6. 2 adam q22020 80
#7. 2 adam q32020 85
与 data.table
类似:
library(data.table)
setDT(df)[quarter != shift(quarter, type = 'lead') &
score != lead(score, type = 'lead')]
也许你可以像下面那样尝试 subset
+ ave
subset(
df,
!!ave(score,id,FUN = function(x) c(TRUE,diff(x)!=0)&length(x)>1)
)
这给出了
id name quarter score
1 1 john q1_2020 80
3 1 john q3_2020 85
4 1 john q4_2020 75
5 2 adam q1_2020 80
7 2 adam q3_2020 85
数据
> dput(df)
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L), name = c("john",
"john", "john", "john", "adam", "adam", "adam", "lana"), quarter = c("q1_2020",
"q2_2020", "q3_2020", "q4_2020", "q1_2020", "q2_2020", "q3_2020",
"q1_2020"), score = c(80L, 80L, 85L, 75L, 80L, 80L, 85L, 50L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))