如果在 R 中满足条件,则连续两天(或 N)天更改列中的值
Change values from a column for two (or N) consecutive days if condition met in R
我有以下数据框:
df <- structure(list(DateTime = structure(c(1477978200, 1477980000, 1477981800, 1477983600, 1477985400, 1477987200, 1477989000, 1477990800, 1477992600, 1477994400, 1477996200, 1477998000, 1477999800, 1478001600, 1478003400, 1478005200, 1478007000, 1478008800, 1478010600, 1478012400, 1478014200, 1478016000, 1478017800, 1478019600, 1478021400, 1478023200, 1478025000, 1478026800, 1478028600, 1478030400, 1478032200, 1478034000, 1478035800, 1478037600, 1478039400, 1478041200, 1478043000, 1478044800, 1478046600, 1478048400, 1478050200, 1478052000, 1478053800, 1478055600, 1478057400, 1478059200, 1478061000, 1478062800, 1478064600, 1478066400, 1478068200, 1478070000, 1478071800, 1478073600, 1478075400, 1478077200, 1478079000, 1478080800, 1478082600, 1478084400, 1478086200, 1478088000, 1478089800, 1478091600, 1478093400, 1478095200, 1478097000, 1478098800, 1478100600, 1478102400, 1478104200, 1478106000, 1478107800, 1478109600, 1478111400, 1478113200, 1478115000, 1478116800, 1478118600, 1478120400), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"), Date = structure(c(17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107), class = "Date"), Rain_daily = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), PET_daily = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Mask = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), row.names = c(NA, 80L), class = "data.frame")
如果某一天“Rain_daily”大于“PET_daily”,我想连续两天将“掩码”列中的值更改为“FALSE”。例如,对于给定的数据框,Mask 列中的值将在“2016-11-01”和“2016-11-02”这两个日子(如 8>7)替换为 FALSE。
我们如何做到这一点?我想避免循环。
使用 dplyr
中的 case_when
。
如果日期满足条件 Rain_daily > PET_daily
,则为这 (df[Rain_daily > PET_daily, Date]
) 天和接下来的 (df[Rain_daily > PET_daily, Date]+1
) 天显示 Mask
。其他地方保持 Mask
不变 (TRUE ~ Mask
)
library(data.table)
library(dplyr)
df <- df %>% as.data.table() # to simlify operations on df
df <- df %>%
mutate(Mask = case_when(Date %in% c(df[Rain_daily > PET_daily, Date],df[Rain_daily > PET_daily, Date]+1) ~ FALSE,
TRUE ~Mask ))
希望对您有所帮助
,OP 提到 我的数据框非常大,有多个列。
因此,这是一种 data.table 方法,它 通过引用分配 仅 Mask
列。这避免了复制整个数据集。
library(data.table)
setDT(df2)
df2[, Mask := !Date %in% CJ(.SD[Rain_daily > PET_daily, Date], 0:1, unique = TRUE)[, V1 + V2]]
df2
DateTime Date Rain_daily PET_daily Mask
1: 2015-01-01 00:00:00 2015-01-01 1 2 TRUE
2: 2015-01-01 00:30:00 2015-01-01 1 2 TRUE
3: 2015-01-01 01:00:00 2015-01-01 1 2 TRUE
4: 2015-01-02 00:00:00 2015-01-02 3 2 FALSE
5: 2015-01-02 00:30:00 2015-01-02 3 2 FALSE
6: 2015-01-02 01:00:00 2015-01-02 3 2 FALSE
7: 2015-01-03 00:00:00 2015-01-03 1 2 FALSE
8: 2015-01-03 00:30:00 2015-01-03 1 2 FALSE
9: 2015-01-03 01:00:00 2015-01-03 1 2 FALSE
10: 2015-01-04 00:00:00 2015-01-04 1 2 TRUE
11: 2015-01-04 00:30:00 2015-01-04 1 2 TRUE
12: 2015-01-04 01:00:00 2015-01-04 1 2 TRUE
13: 2015-01-05 00:00:00 2015-01-05 3 2 FALSE
14: 2015-01-05 00:30:00 2015-01-05 3 2 FALSE
15: 2015-01-05 01:00:00 2015-01-05 3 2 FALSE
16: 2015-01-06 00:00:00 2015-01-06 4 2 FALSE
17: 2015-01-06 00:30:00 2015-01-06 4 2 FALSE
18: 2015-01-06 01:00:00 2015-01-06 4 2 FALSE
19: 2015-01-07 00:00:00 2015-01-07 1 2 FALSE
20: 2015-01-07 00:30:00 2015-01-07 1 2 FALSE
21: 2015-01-07 01:00:00 2015-01-07 1 2 FALSE
22: 2015-01-08 00:00:00 2015-01-08 1 2 TRUE
23: 2015-01-08 00:30:00 2015-01-08 1 2 TRUE
24: 2015-01-08 01:00:00 2015-01-08 1 2 TRUE
DateTime Date Rain_daily PET_daily Mask
请注意,此处使用了一个新的示例数据集,其中包括不同的日期以演示不同的用例,请参阅下面的部分。
新样本数据集
作为一个可重现的例子,每天只模拟三个时间戳,但在 8 个不同的日子。
library(data.table)
df2 <- fread("
Date Rain_daily PET_daily
2015-01-01 1 2
2015-01-02 3 2
2015-01-03 1 2
2015-01-04 1 2
2015-01-05 3 2
2015-01-06 4 2
2015-01-07 1 2
2015-01-08 1 2
")[CJ(Date, DateTime = as.ITime(60 * 30 * 0:2))[
, DateTime := as.POSIXct(Date) + DateTime], on = "Date"][
, setcolorder(.SD, "DateTime")]
df2
DateTime Date Rain_daily PET_daily
1: 2015-01-01 00:00:00 2015-01-01 1 2
2: 2015-01-01 00:30:00 2015-01-01 1 2
3: 2015-01-01 01:00:00 2015-01-01 1 2
4: 2015-01-02 00:00:00 2015-01-02 3 2
5: 2015-01-02 00:30:00 2015-01-02 3 2
6: 2015-01-02 01:00:00 2015-01-02 3 2
7: 2015-01-03 00:00:00 2015-01-03 1 2
8: 2015-01-03 00:30:00 2015-01-03 1 2
9: 2015-01-03 01:00:00 2015-01-03 1 2
10: 2015-01-04 00:00:00 2015-01-04 1 2
11: 2015-01-04 00:30:00 2015-01-04 1 2
12: 2015-01-04 01:00:00 2015-01-04 1 2
13: 2015-01-05 00:00:00 2015-01-05 3 2
14: 2015-01-05 00:30:00 2015-01-05 3 2
15: 2015-01-05 01:00:00 2015-01-05 3 2
16: 2015-01-06 00:00:00 2015-01-06 4 2
17: 2015-01-06 00:30:00 2015-01-06 4 2
18: 2015-01-06 01:00:00 2015-01-06 4 2
19: 2015-01-07 00:00:00 2015-01-07 1 2
20: 2015-01-07 00:30:00 2015-01-07 1 2
21: 2015-01-07 01:00:00 2015-01-07 1 2
22: 2015-01-08 00:00:00 2015-01-08 1 2
23: 2015-01-08 00:30:00 2015-01-08 1 2
24: 2015-01-08 01:00:00 2015-01-08 1 2
DateTime Date Rain_daily PET_daily
我有以下数据框:
df <- structure(list(DateTime = structure(c(1477978200, 1477980000, 1477981800, 1477983600, 1477985400, 1477987200, 1477989000, 1477990800, 1477992600, 1477994400, 1477996200, 1477998000, 1477999800, 1478001600, 1478003400, 1478005200, 1478007000, 1478008800, 1478010600, 1478012400, 1478014200, 1478016000, 1478017800, 1478019600, 1478021400, 1478023200, 1478025000, 1478026800, 1478028600, 1478030400, 1478032200, 1478034000, 1478035800, 1478037600, 1478039400, 1478041200, 1478043000, 1478044800, 1478046600, 1478048400, 1478050200, 1478052000, 1478053800, 1478055600, 1478057400, 1478059200, 1478061000, 1478062800, 1478064600, 1478066400, 1478068200, 1478070000, 1478071800, 1478073600, 1478075400, 1478077200, 1478079000, 1478080800, 1478082600, 1478084400, 1478086200, 1478088000, 1478089800, 1478091600, 1478093400, 1478095200, 1478097000, 1478098800, 1478100600, 1478102400, 1478104200, 1478106000, 1478107800, 1478109600, 1478111400, 1478113200, 1478115000, 1478116800, 1478118600, 1478120400), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago"), Date = structure(c(17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17106, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107, 17107), class = "Date"), Rain_daily = c(8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), PET_daily = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Mask = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)), row.names = c(NA, 80L), class = "data.frame")
如果某一天“Rain_daily”大于“PET_daily”,我想连续两天将“掩码”列中的值更改为“FALSE”。例如,对于给定的数据框,Mask 列中的值将在“2016-11-01”和“2016-11-02”这两个日子(如 8>7)替换为 FALSE。
我们如何做到这一点?我想避免循环。
使用 dplyr
中的 case_when
。
如果日期满足条件 Rain_daily > PET_daily
,则为这 (df[Rain_daily > PET_daily, Date]
) 天和接下来的 (df[Rain_daily > PET_daily, Date]+1
) 天显示 Mask
。其他地方保持 Mask
不变 (TRUE ~ Mask
)
library(data.table)
library(dplyr)
df <- df %>% as.data.table() # to simlify operations on df
df <- df %>%
mutate(Mask = case_when(Date %in% c(df[Rain_daily > PET_daily, Date],df[Rain_daily > PET_daily, Date]+1) ~ FALSE,
TRUE ~Mask ))
希望对您有所帮助
因此,这是一种 data.table 方法,它 通过引用分配 仅 Mask
列。这避免了复制整个数据集。
library(data.table)
setDT(df2)
df2[, Mask := !Date %in% CJ(.SD[Rain_daily > PET_daily, Date], 0:1, unique = TRUE)[, V1 + V2]]
df2
DateTime Date Rain_daily PET_daily Mask 1: 2015-01-01 00:00:00 2015-01-01 1 2 TRUE 2: 2015-01-01 00:30:00 2015-01-01 1 2 TRUE 3: 2015-01-01 01:00:00 2015-01-01 1 2 TRUE 4: 2015-01-02 00:00:00 2015-01-02 3 2 FALSE 5: 2015-01-02 00:30:00 2015-01-02 3 2 FALSE 6: 2015-01-02 01:00:00 2015-01-02 3 2 FALSE 7: 2015-01-03 00:00:00 2015-01-03 1 2 FALSE 8: 2015-01-03 00:30:00 2015-01-03 1 2 FALSE 9: 2015-01-03 01:00:00 2015-01-03 1 2 FALSE 10: 2015-01-04 00:00:00 2015-01-04 1 2 TRUE 11: 2015-01-04 00:30:00 2015-01-04 1 2 TRUE 12: 2015-01-04 01:00:00 2015-01-04 1 2 TRUE 13: 2015-01-05 00:00:00 2015-01-05 3 2 FALSE 14: 2015-01-05 00:30:00 2015-01-05 3 2 FALSE 15: 2015-01-05 01:00:00 2015-01-05 3 2 FALSE 16: 2015-01-06 00:00:00 2015-01-06 4 2 FALSE 17: 2015-01-06 00:30:00 2015-01-06 4 2 FALSE 18: 2015-01-06 01:00:00 2015-01-06 4 2 FALSE 19: 2015-01-07 00:00:00 2015-01-07 1 2 FALSE 20: 2015-01-07 00:30:00 2015-01-07 1 2 FALSE 21: 2015-01-07 01:00:00 2015-01-07 1 2 FALSE 22: 2015-01-08 00:00:00 2015-01-08 1 2 TRUE 23: 2015-01-08 00:30:00 2015-01-08 1 2 TRUE 24: 2015-01-08 01:00:00 2015-01-08 1 2 TRUE DateTime Date Rain_daily PET_daily Mask
请注意,此处使用了一个新的示例数据集,其中包括不同的日期以演示不同的用例,请参阅下面的部分。
新样本数据集
作为一个可重现的例子,每天只模拟三个时间戳,但在 8 个不同的日子。
library(data.table)
df2 <- fread("
Date Rain_daily PET_daily
2015-01-01 1 2
2015-01-02 3 2
2015-01-03 1 2
2015-01-04 1 2
2015-01-05 3 2
2015-01-06 4 2
2015-01-07 1 2
2015-01-08 1 2
")[CJ(Date, DateTime = as.ITime(60 * 30 * 0:2))[
, DateTime := as.POSIXct(Date) + DateTime], on = "Date"][
, setcolorder(.SD, "DateTime")]
df2
DateTime Date Rain_daily PET_daily 1: 2015-01-01 00:00:00 2015-01-01 1 2 2: 2015-01-01 00:30:00 2015-01-01 1 2 3: 2015-01-01 01:00:00 2015-01-01 1 2 4: 2015-01-02 00:00:00 2015-01-02 3 2 5: 2015-01-02 00:30:00 2015-01-02 3 2 6: 2015-01-02 01:00:00 2015-01-02 3 2 7: 2015-01-03 00:00:00 2015-01-03 1 2 8: 2015-01-03 00:30:00 2015-01-03 1 2 9: 2015-01-03 01:00:00 2015-01-03 1 2 10: 2015-01-04 00:00:00 2015-01-04 1 2 11: 2015-01-04 00:30:00 2015-01-04 1 2 12: 2015-01-04 01:00:00 2015-01-04 1 2 13: 2015-01-05 00:00:00 2015-01-05 3 2 14: 2015-01-05 00:30:00 2015-01-05 3 2 15: 2015-01-05 01:00:00 2015-01-05 3 2 16: 2015-01-06 00:00:00 2015-01-06 4 2 17: 2015-01-06 00:30:00 2015-01-06 4 2 18: 2015-01-06 01:00:00 2015-01-06 4 2 19: 2015-01-07 00:00:00 2015-01-07 1 2 20: 2015-01-07 00:30:00 2015-01-07 1 2 21: 2015-01-07 01:00:00 2015-01-07 1 2 22: 2015-01-08 00:00:00 2015-01-08 1 2 23: 2015-01-08 00:30:00 2015-01-08 1 2 24: 2015-01-08 01:00:00 2015-01-08 1 2 DateTime Date Rain_daily PET_daily