根据 R 数据帧中的两个条件进行变异
Mutate based on two conditions in R dataframe
我有一个可以从下面的代码生成的 R 数据框
DF <- data.frame("Person_id" = c(1,1,1,1,2,2,2,2,3,3), "Type" = c("IN","OUT","IN","ANC","IN","OUT","IN","ANC","EM","ANC"), "Name" = c("Nara","Nara","Nara","Nara","Dora","Dora","Dora","Dora","Sara","Sara"),"day_1" = c("21/1/2002","21/4/2002","21/6/2002","21/9/2002","28/1/2012","28/4/2012","28/6/2012","28/9/2012","30/06/2004","30/06/2005"),"day_2" = c("23/1/2002","21/4/2002","","","30/1/2012","28/4/2012","","28/9/2012","",""))
我想做的是根据下面给出的几个条件创建两个新列 admit_start_date
和 admit_end_date
规则 1
admit_start_date = day_1
admit_end_date = day_2 (sometimes day_2 can be NA. So refer Rule 2 below)
规则 2
if day_2 is (null or blank or na) and Type is (Out or ANC or EM) then
admit_end_date = day_1
else (if Type is IN)
admit_end_date = day_1 + 5 (days)
这是我正在尝试但似乎没有帮助的方法
transform_dates = function(DF){ # this function is to create 'date' columns
DF %>%
mutate(admit_start_date = day_1) %>%
mutate(admit_end_date = day_2) %>%
admit_end_date = if_else(((Type == 'Out' & admit_end_date.isna() ==True|Type == 'ANC' & admit_end_date.isna() ==True|Type == 'EM' & admit_end_date.isna() ==True),day_1,day_1 + 5)
)
}
如您所见,我不确定如何检查新创建的列的 NA
并将那些 NAs
替换为 day_1
或 day_1 + 5(days)
基于类型列。
你能帮忙吗?
我希望我的输出如下所示
将"day"
列转换为实际日期对象后,我们可以使用case_when
分别指定每个条件。
library(dplyr)
DF %>%
mutate_at(vars(starts_with('day')), as.Date, "%d/%m/%Y") %>%
mutate(admit_start_date = day_1,
admit_end_date = case_when(
!is.na(day_2) ~day_2,
is.na(day_2) & Type %in% c('OUT', 'ANC', 'EM') ~ day_1,
Type == 'IN' ~ day_1 + 5))
# Person_id Type Name day_1 day_2 admit_start_date admit_end_date
#1 1 IN Nara 2002-01-21 2002-01-23 2002-01-21 2002-01-23
#2 1 OUT Nara 2002-04-21 2002-04-21 2002-04-21 2002-04-21
#3 1 IN Nara 2002-06-21 <NA> 2002-06-21 2002-06-26
#4 1 ANC Nara 2002-09-21 <NA> 2002-09-21 2002-09-21
#5 2 IN Dora 2012-01-28 2012-01-30 2012-01-28 2012-01-30
#6 2 OUT Dora 2012-04-28 2012-04-28 2012-04-28 2012-04-28
#7 2 IN Dora 2012-06-28 <NA> 2012-06-28 2012-07-03
#8 2 ANC Dora 2012-09-28 2012-09-28 2012-09-28 2012-09-28
#9 3 EM Sara 2004-06-30 <NA> 2004-06-30 2004-06-30
#10 3 ANC Sara 2005-06-30 <NA> 2005-06-30 2005-06-30
dataframe 中的日期不是 class "Date", (class(DF$day_1)
),使用 mutate_at
我们将它们的 class 更改为 "Date" 所以我们可以对其进行数学计算。 starts_with('day')
表示名称以 "day"
开头的任何列都将转换为 "Date" class。当我们想将相同的功能应用于多个列时,我们使用 mutate_at
。
case_when
是嵌套 ifelse
语句的替代方法。它们按顺序执行。因此检查第一个条件,如果条件满足则不检查其余条件。如果不满足第一个条件,则检查第二个条件,依此类推。因此,这里不需要 else
。如果none的条件都满足就returnsNA
。检查 ?case_when
。
我有一个可以从下面的代码生成的 R 数据框
DF <- data.frame("Person_id" = c(1,1,1,1,2,2,2,2,3,3), "Type" = c("IN","OUT","IN","ANC","IN","OUT","IN","ANC","EM","ANC"), "Name" = c("Nara","Nara","Nara","Nara","Dora","Dora","Dora","Dora","Sara","Sara"),"day_1" = c("21/1/2002","21/4/2002","21/6/2002","21/9/2002","28/1/2012","28/4/2012","28/6/2012","28/9/2012","30/06/2004","30/06/2005"),"day_2" = c("23/1/2002","21/4/2002","","","30/1/2012","28/4/2012","","28/9/2012","",""))
我想做的是根据下面给出的几个条件创建两个新列 admit_start_date
和 admit_end_date
规则 1
admit_start_date = day_1
admit_end_date = day_2 (sometimes day_2 can be NA. So refer Rule 2 below)
规则 2
if day_2 is (null or blank or na) and Type is (Out or ANC or EM) then
admit_end_date = day_1
else (if Type is IN)
admit_end_date = day_1 + 5 (days)
这是我正在尝试但似乎没有帮助的方法
transform_dates = function(DF){ # this function is to create 'date' columns
DF %>%
mutate(admit_start_date = day_1) %>%
mutate(admit_end_date = day_2) %>%
admit_end_date = if_else(((Type == 'Out' & admit_end_date.isna() ==True|Type == 'ANC' & admit_end_date.isna() ==True|Type == 'EM' & admit_end_date.isna() ==True),day_1,day_1 + 5)
)
}
如您所见,我不确定如何检查新创建的列的 NA
并将那些 NAs
替换为 day_1
或 day_1 + 5(days)
基于类型列。
你能帮忙吗?
我希望我的输出如下所示
将"day"
列转换为实际日期对象后,我们可以使用case_when
分别指定每个条件。
library(dplyr)
DF %>%
mutate_at(vars(starts_with('day')), as.Date, "%d/%m/%Y") %>%
mutate(admit_start_date = day_1,
admit_end_date = case_when(
!is.na(day_2) ~day_2,
is.na(day_2) & Type %in% c('OUT', 'ANC', 'EM') ~ day_1,
Type == 'IN' ~ day_1 + 5))
# Person_id Type Name day_1 day_2 admit_start_date admit_end_date
#1 1 IN Nara 2002-01-21 2002-01-23 2002-01-21 2002-01-23
#2 1 OUT Nara 2002-04-21 2002-04-21 2002-04-21 2002-04-21
#3 1 IN Nara 2002-06-21 <NA> 2002-06-21 2002-06-26
#4 1 ANC Nara 2002-09-21 <NA> 2002-09-21 2002-09-21
#5 2 IN Dora 2012-01-28 2012-01-30 2012-01-28 2012-01-30
#6 2 OUT Dora 2012-04-28 2012-04-28 2012-04-28 2012-04-28
#7 2 IN Dora 2012-06-28 <NA> 2012-06-28 2012-07-03
#8 2 ANC Dora 2012-09-28 2012-09-28 2012-09-28 2012-09-28
#9 3 EM Sara 2004-06-30 <NA> 2004-06-30 2004-06-30
#10 3 ANC Sara 2005-06-30 <NA> 2005-06-30 2005-06-30
dataframe 中的日期不是 class "Date", (class(DF$day_1)
),使用 mutate_at
我们将它们的 class 更改为 "Date" 所以我们可以对其进行数学计算。 starts_with('day')
表示名称以 "day"
开头的任何列都将转换为 "Date" class。当我们想将相同的功能应用于多个列时,我们使用 mutate_at
。
case_when
是嵌套 ifelse
语句的替代方法。它们按顺序执行。因此检查第一个条件,如果条件满足则不检查其余条件。如果不满足第一个条件,则检查第二个条件,依此类推。因此,这里不需要 else
。如果none的条件都满足就returnsNA
。检查 ?case_when
。