根据另一列的变化获取列值

Obtain column values based on changes in another column

我有一个这样的数据框:

dateColumnDF <- c("2022-04-12 00:02:57", "2022-04-12 00:02:58", "2022-04-12 00:02:59", "2022-04-12 00:03:00", "2022-04-12 00:03:02")
ValueColumnDf <- c("50","465","788","99","25")
Var1Df <- c("0", "0", "0","0","0")
Var2Df <- c("0", "0", "1","1","0")
Var3Df <- c("0","1","0","1","0")
df <- data.frame(dateColumnDF, ValueColumnDf,Var1Df,Var2Df,Var3Df)
colnames(df) <- c("timestamp","Value","Var1","Var2","Var3")

而且我想获得一个反映 Varx 值变化方式的数据框,从一个值到另一个值(主要是从 0 到 1 并返回),如下所示:

firstColumn <- c("Var2", "Var2", "Var3", "Var3", "Var3","Var3")
secondColumn <- c("1", "0", "1","0","1","0")
thirdColumn <- c("2022-04-12 00:02:59", "2022-04-12 00:03:02", "2022-04-12 00:02:58","  
2022-04-12 00:02:59","2022-04-12 00:03:00","2022-04-12 00:03:02")
fourthColum <- c("788","25","465","788","99","25")
df2 <- data.frame(firstColumn,secondColumn,thirdColumn,fourthColum)
colnames(df2) <- c("Var","flagChangedTo","timestamp","Value")

我发现要查看我需要做的每一行的更改(使用 dplyr):

which(df$value!= dplyr::lag(df$value))

而且我需要将其放入感兴趣的列的循环中,如下所示:

for(i in 3:ncol(df)) {
  x <- which(df[,i]!= dplyr::lag(df[,i]))
}

获取到变化的地方后,如何生成想要的矩阵?

使用tidyr::pivot_longer(你已经在使用dplyr),你可以把table变成长格式。

library(tidyr)
pivot_longer(df, starts_with("Var"), names_to = "Var", values_to = "flagChangedTo")

这给出了

   timestamp           Value Var   flagChangedTo
   <fct>               <fct> <chr> <fct>        
 1 2022-04-12 00:02:57 50    Var1  0            
 2 2022-04-12 00:02:57 50    Var2  0            
 3 2022-04-12 00:02:57 50    Var3  0            
 4 2022-04-12 00:02:58 465   Var1  0            
 5 2022-04-12 00:02:58 465   Var2  0            
...

之后,我们可以按Var分组,用filter只保留前一行的flagChangedTo值不等于当前行的行,使用 dplyr::lag(就像你已经正确建议的那样)。

放在一起,并使用 magrittr 管道 (%>%):

library(tidyr)
df %>% 
  pivot_longer(starts_with("Var"), names_to = "Var", values_to = "flagChangedTo") %>% 
  group_by(Var) %>% 
  arrange(timestamp) %>% 
  filter(flagChangedTo != lag(flagChangedTo)) %>% 
  ungroup() %>% 
  arrange(Var, timestamp)

给出

  timestamp           Value Var   flagChangedTo
  <fct>               <fct> <chr> <fct>        
1 2022-04-12 00:02:59 788   Var2  1            
2 2022-04-12 00:03:02 25    Var2  0            
3 2022-04-12 00:02:58 465   Var3  1            
4 2022-04-12 00:02:59 788   Var3  0            
5 2022-04-12 00:03:00 99    Var3  1            
6 2022-04-12 00:03:02 25    Var3  0            

类似于@Bas 回复:

df %>% pivot_longer(cols=Var1:Var3) %>% 
    arrange(name, timestamp) %>% 
    group_by(name) %>% 
    filter(value!=lag(value)) %>% 
    select(Var=name, flagChangedTo=value,timestamp,Value)

输出:

  Var   flagChangedTo timestamp           Value
  <chr> <chr>         <chr>               <chr>
1 Var2  1             2022-04-12 00:02:59 788  
2 Var2  0             2022-04-12 00:03:02 25   
3 Var3  1             2022-04-12 00:02:58 465  
4 Var3  0             2022-04-12 00:02:59 788  
5 Var3  1             2022-04-12 00:03:00 99   
6 Var3  0             2022-04-12 00:03:02 25