R根据条件在日期列中增加年份
R increment year in date column based on a condition
我想根据条件增加日期列中的年份 - 如果 StartDate 晚于 EndDate,则增加 EndDate 中的年份,例如:
数据:
input_df <- structure(list(C1 = c("A", "C", "E", "G", "I"), C2 = c("B", "D", "F", "H", "J"),
StartDate = c("12/23/2019", "12/24/2019", "12/28/2019", "01/01/2019", "05/15/2019"), EndDate = c("01/07/2019", "12/25/2019", "12/31/2019", "04/11/2019", "05/18/2019")), class = "data.frame", row.names = c(NA, -5L))
input_df:
C1 C2 StartDate EndDate
1 A B 12/23/2019 01/07/2019
2 C D 12/24/2019 12/30/2019
3 E F 12/28/2019 12/30/2019
4 G H 01/01/2019 04/18/2019
5 I J 05/15/2019 05/25/2019
预期输出:
input_df:
C1 C2 StartDate EndDate
1 A B 12/23/2019 01/07/2020
2 C D 12/24/2019 12/25/2019
3 E F 12/28/2019 12/31/2019
4 G H 01/01/2019 04/11/2019
5 I J 05/15/2019 05/18/2019
我尝试执行以下操作来实现此目的:
library(lubridate)
input_df$EndDate[input_df$EndDate < input_df$StartDate] <- mdy(input_df$EndDate) + years(1)
但是我收到的输出是:
C1 C2 StartDate EndDate
1 A B 12/23/2019 18268
2 C D 12/24/2019 12/25/2019
3 E F 12/28/2019 12/31/2019
4 G H 01/01/2019 04/11/2019
5 I J 05/15/2019 05/18/2019
带有警告 'number of items to replace is not a multiple of replacement length'
使用 tidyverse
包:
input_df %>% mutate(StartDate = as.Date(StartDate, format = "%m/%d/%Y"),
EndDate = as.Date(EndDate, format = "%m/%d/%Y"),
EndDate_N = if_else(StartDate >= EndDate, EndDate + years(1), EndDate))
我认为您收到错误是因为您只对左侧进行了子集化。此外,您可能希望对 StartDate
和 EndDate
使用相同的 class。试试这个:
input_df$StartDate <- mdy(input_df$StartDate)
input_df$EndDate <- mdy(input_df$EndDate)
input_df$EndDate[input_df$EndDate < input_df$StartDate] <-
input_df$EndDate[input_df$EndDate < input_df$StartDate] + years(1)
input_df
我们也可以用基数 R
#Convert Start and End date to `POSIXlt` format
input_df[c('StartDate','EndDate')] <- lapply(input_df[c('StartDate','EndDate')],
as.POSIXlt, format = "%m/%d/%Y")
#Get the row index where StartDate > EndDate
inds <- input_df$StartDate > input_df$EndDate
#Increment the year for those indexes
input_df$EndDate[inds]$year <- input_df$EndDate[inds]$year + 1
input_df
# C1 C2 StartDate EndDate
#1 A B 2019-12-23 2020-01-07
#2 C D 2019-12-24 2019-12-25
#3 E F 2019-12-28 2019-12-31
#4 G H 2019-01-01 2019-04-11
#5 I J 2019-05-15 2019-05-18
我想根据条件增加日期列中的年份 - 如果 StartDate 晚于 EndDate,则增加 EndDate 中的年份,例如:
数据:
input_df <- structure(list(C1 = c("A", "C", "E", "G", "I"), C2 = c("B", "D", "F", "H", "J"),
StartDate = c("12/23/2019", "12/24/2019", "12/28/2019", "01/01/2019", "05/15/2019"), EndDate = c("01/07/2019", "12/25/2019", "12/31/2019", "04/11/2019", "05/18/2019")), class = "data.frame", row.names = c(NA, -5L))
input_df:
C1 C2 StartDate EndDate
1 A B 12/23/2019 01/07/2019
2 C D 12/24/2019 12/30/2019
3 E F 12/28/2019 12/30/2019
4 G H 01/01/2019 04/18/2019
5 I J 05/15/2019 05/25/2019
预期输出: input_df:
C1 C2 StartDate EndDate
1 A B 12/23/2019 01/07/2020
2 C D 12/24/2019 12/25/2019
3 E F 12/28/2019 12/31/2019
4 G H 01/01/2019 04/11/2019
5 I J 05/15/2019 05/18/2019
我尝试执行以下操作来实现此目的:
library(lubridate)
input_df$EndDate[input_df$EndDate < input_df$StartDate] <- mdy(input_df$EndDate) + years(1)
但是我收到的输出是:
C1 C2 StartDate EndDate
1 A B 12/23/2019 18268
2 C D 12/24/2019 12/25/2019
3 E F 12/28/2019 12/31/2019
4 G H 01/01/2019 04/11/2019
5 I J 05/15/2019 05/18/2019
带有警告 'number of items to replace is not a multiple of replacement length'
使用 tidyverse
包:
input_df %>% mutate(StartDate = as.Date(StartDate, format = "%m/%d/%Y"),
EndDate = as.Date(EndDate, format = "%m/%d/%Y"),
EndDate_N = if_else(StartDate >= EndDate, EndDate + years(1), EndDate))
我认为您收到错误是因为您只对左侧进行了子集化。此外,您可能希望对 StartDate
和 EndDate
使用相同的 class。试试这个:
input_df$StartDate <- mdy(input_df$StartDate)
input_df$EndDate <- mdy(input_df$EndDate)
input_df$EndDate[input_df$EndDate < input_df$StartDate] <-
input_df$EndDate[input_df$EndDate < input_df$StartDate] + years(1)
input_df
我们也可以用基数 R
#Convert Start and End date to `POSIXlt` format
input_df[c('StartDate','EndDate')] <- lapply(input_df[c('StartDate','EndDate')],
as.POSIXlt, format = "%m/%d/%Y")
#Get the row index where StartDate > EndDate
inds <- input_df$StartDate > input_df$EndDate
#Increment the year for those indexes
input_df$EndDate[inds]$year <- input_df$EndDate[inds]$year + 1
input_df
# C1 C2 StartDate EndDate
#1 A B 2019-12-23 2020-01-07
#2 C D 2019-12-24 2019-12-25
#3 E F 2019-12-28 2019-12-31
#4 G H 2019-01-01 2019-04-11
#5 I J 2019-05-15 2019-05-18