根据另一列的变化获取列值
Obtain column values based on changes in another column
我有一个这样的数据框:
dateColumnDF <- c("2022-04-12 00:02:57", "2022-04-12 00:02:58", "2022-04-12 00:02:59", "2022-04-12 00:03:00", "2022-04-12 00:03:02")
ValueColumnDf <- c("50","465","788","99","25")
Var1Df <- c("0", "0", "0","0","0")
Var2Df <- c("0", "0", "1","1","0")
Var3Df <- c("0","1","0","1","0")
df <- data.frame(dateColumnDF, ValueColumnDf,Var1Df,Var2Df,Var3Df)
colnames(df) <- c("timestamp","Value","Var1","Var2","Var3")
而且我想获得一个反映 Varx 值变化方式的数据框,从一个值到另一个值(主要是从 0 到 1 并返回),如下所示:
firstColumn <- c("Var2", "Var2", "Var3", "Var3", "Var3","Var3")
secondColumn <- c("1", "0", "1","0","1","0")
thirdColumn <- c("2022-04-12 00:02:59", "2022-04-12 00:03:02", "2022-04-12 00:02:58","
2022-04-12 00:02:59","2022-04-12 00:03:00","2022-04-12 00:03:02")
fourthColum <- c("788","25","465","788","99","25")
df2 <- data.frame(firstColumn,secondColumn,thirdColumn,fourthColum)
colnames(df2) <- c("Var","flagChangedTo","timestamp","Value")
我发现要查看我需要做的每一行的更改(使用 dplyr):
which(df$value!= dplyr::lag(df$value))
而且我需要将其放入感兴趣的列的循环中,如下所示:
for(i in 3:ncol(df)) {
x <- which(df[,i]!= dplyr::lag(df[,i]))
}
获取到变化的地方后,如何生成想要的矩阵?
使用tidyr::pivot_longer
(你已经在使用dplyr
),你可以把table变成长格式。
library(tidyr)
pivot_longer(df, starts_with("Var"), names_to = "Var", values_to = "flagChangedTo")
这给出了
timestamp Value Var flagChangedTo
<fct> <fct> <chr> <fct>
1 2022-04-12 00:02:57 50 Var1 0
2 2022-04-12 00:02:57 50 Var2 0
3 2022-04-12 00:02:57 50 Var3 0
4 2022-04-12 00:02:58 465 Var1 0
5 2022-04-12 00:02:58 465 Var2 0
...
之后,我们可以按Var
分组,用filter
只保留前一行的flagChangedTo
值不等于当前行的行,使用 dplyr::lag
(就像你已经正确建议的那样)。
放在一起,并使用 magrittr 管道 (%>%
):
library(tidyr)
df %>%
pivot_longer(starts_with("Var"), names_to = "Var", values_to = "flagChangedTo") %>%
group_by(Var) %>%
arrange(timestamp) %>%
filter(flagChangedTo != lag(flagChangedTo)) %>%
ungroup() %>%
arrange(Var, timestamp)
给出
timestamp Value Var flagChangedTo
<fct> <fct> <chr> <fct>
1 2022-04-12 00:02:59 788 Var2 1
2 2022-04-12 00:03:02 25 Var2 0
3 2022-04-12 00:02:58 465 Var3 1
4 2022-04-12 00:02:59 788 Var3 0
5 2022-04-12 00:03:00 99 Var3 1
6 2022-04-12 00:03:02 25 Var3 0
类似于@Bas 回复:
df %>% pivot_longer(cols=Var1:Var3) %>%
arrange(name, timestamp) %>%
group_by(name) %>%
filter(value!=lag(value)) %>%
select(Var=name, flagChangedTo=value,timestamp,Value)
输出:
Var flagChangedTo timestamp Value
<chr> <chr> <chr> <chr>
1 Var2 1 2022-04-12 00:02:59 788
2 Var2 0 2022-04-12 00:03:02 25
3 Var3 1 2022-04-12 00:02:58 465
4 Var3 0 2022-04-12 00:02:59 788
5 Var3 1 2022-04-12 00:03:00 99
6 Var3 0 2022-04-12 00:03:02 25
我有一个这样的数据框:
dateColumnDF <- c("2022-04-12 00:02:57", "2022-04-12 00:02:58", "2022-04-12 00:02:59", "2022-04-12 00:03:00", "2022-04-12 00:03:02")
ValueColumnDf <- c("50","465","788","99","25")
Var1Df <- c("0", "0", "0","0","0")
Var2Df <- c("0", "0", "1","1","0")
Var3Df <- c("0","1","0","1","0")
df <- data.frame(dateColumnDF, ValueColumnDf,Var1Df,Var2Df,Var3Df)
colnames(df) <- c("timestamp","Value","Var1","Var2","Var3")
而且我想获得一个反映 Varx 值变化方式的数据框,从一个值到另一个值(主要是从 0 到 1 并返回),如下所示:
firstColumn <- c("Var2", "Var2", "Var3", "Var3", "Var3","Var3")
secondColumn <- c("1", "0", "1","0","1","0")
thirdColumn <- c("2022-04-12 00:02:59", "2022-04-12 00:03:02", "2022-04-12 00:02:58","
2022-04-12 00:02:59","2022-04-12 00:03:00","2022-04-12 00:03:02")
fourthColum <- c("788","25","465","788","99","25")
df2 <- data.frame(firstColumn,secondColumn,thirdColumn,fourthColum)
colnames(df2) <- c("Var","flagChangedTo","timestamp","Value")
我发现要查看我需要做的每一行的更改(使用 dplyr):
which(df$value!= dplyr::lag(df$value))
而且我需要将其放入感兴趣的列的循环中,如下所示:
for(i in 3:ncol(df)) {
x <- which(df[,i]!= dplyr::lag(df[,i]))
}
获取到变化的地方后,如何生成想要的矩阵?
使用tidyr::pivot_longer
(你已经在使用dplyr
),你可以把table变成长格式。
library(tidyr)
pivot_longer(df, starts_with("Var"), names_to = "Var", values_to = "flagChangedTo")
这给出了
timestamp Value Var flagChangedTo
<fct> <fct> <chr> <fct>
1 2022-04-12 00:02:57 50 Var1 0
2 2022-04-12 00:02:57 50 Var2 0
3 2022-04-12 00:02:57 50 Var3 0
4 2022-04-12 00:02:58 465 Var1 0
5 2022-04-12 00:02:58 465 Var2 0
...
之后,我们可以按Var
分组,用filter
只保留前一行的flagChangedTo
值不等于当前行的行,使用 dplyr::lag
(就像你已经正确建议的那样)。
放在一起,并使用 magrittr 管道 (%>%
):
library(tidyr)
df %>%
pivot_longer(starts_with("Var"), names_to = "Var", values_to = "flagChangedTo") %>%
group_by(Var) %>%
arrange(timestamp) %>%
filter(flagChangedTo != lag(flagChangedTo)) %>%
ungroup() %>%
arrange(Var, timestamp)
给出
timestamp Value Var flagChangedTo
<fct> <fct> <chr> <fct>
1 2022-04-12 00:02:59 788 Var2 1
2 2022-04-12 00:03:02 25 Var2 0
3 2022-04-12 00:02:58 465 Var3 1
4 2022-04-12 00:02:59 788 Var3 0
5 2022-04-12 00:03:00 99 Var3 1
6 2022-04-12 00:03:02 25 Var3 0
类似于@Bas 回复:
df %>% pivot_longer(cols=Var1:Var3) %>%
arrange(name, timestamp) %>%
group_by(name) %>%
filter(value!=lag(value)) %>%
select(Var=name, flagChangedTo=value,timestamp,Value)
输出:
Var flagChangedTo timestamp Value
<chr> <chr> <chr> <chr>
1 Var2 1 2022-04-12 00:02:59 788
2 Var2 0 2022-04-12 00:03:02 25
3 Var3 1 2022-04-12 00:02:58 465
4 Var3 0 2022-04-12 00:02:59 788
5 Var3 1 2022-04-12 00:03:00 99
6 Var3 0 2022-04-12 00:03:02 25