使用 Lubridate 和 Dplyr 基于时间段创建子组

Creating Subgroups based on Time Period using Lubridate and Dplyr

这应该是一个快速简单的问题。使用下面的简单数据框,我想使用 dplyr 和 lubridate 将 OnsetDate 在 2015 年 4 月或之后的所有客户分组。这个组将被称为 "NewOnset",其余的将是 "OldOnset"。

我是润滑新手,遇到了一些麻烦。

City<-c("Toronto", "Toronto", "Montreal","Ottawa","Ottawa",
        "Hamilton","Peterborough","Toronto","Hamilton","Hamilton")

OnsetDate<-c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")

Client<-c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")

DF<- data.frame(Client,City,OnsetDate)

不需要使用外部包来完成这个简单的任务。在基数 R 中:

## coerce character to a valid date
DF$OnsetDate <- as.Date(DF$OnsetDate ,"%m/%d/%Y")
## flter rows
DF[DF$OnsetDate>"2015-04-30",]

#    Client     City  OnsetDate
# 4     Cl4   Ottawa 2015-07-10
# 6     Cl6 Hamilton 2016-03-11
# 8     Cl8  Toronto 2015-06-10
# 10   Cl10 Hamilton 2016-08-08

您可以在没有 dplyr 功能的情况下执行此操作。 Lubridate 的函数系列以您要转换为日期的对象的格式命名。在这种情况下,您要使用 mdy 函数,因为输入格式是月-日-年。

DF$OnsetDate <- mdy(DF$OnsetDate)

然后您可以通过根据您的条件对行进行子集化来创建新的数据框。

NewOnset <- DF[DF$OnsetDate >= as.Date("2015-04-01"), ]
OldOnset <- DF[DF$OnsetDate < as.Date("2015-04-01"), ]

您的代码有几个问题。这应该可以解决它:

City <- c("Toronto", "Toronto", "Montreal", "Ottawa", "Ottawa", "Hamilton", "Peterborough", "Toronto", "Hamilton", "Hamilton")
OnsetDate <- c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
Client <- c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")

df <- data.frame(Client, City, OnsetDate)

df$OnsetDate <- as.Date(df$OnsetDate, format = "%m/%d/%Y")    

# here comes the magic
df %>% filter(OnsetDate > as.Date("04/01/2015", format = "%m/%d/%Y"))

你可以使用format参数,这里不需要lubridate包。上面的代码产生:

  Client     City  OnsetDate
1    Cl3 Montreal 2015-04-19
2    Cl4   Ottawa 2015-07-10
3    Cl6 Hamilton 2016-03-11
4    Cl8  Toronto 2015-06-10
5   Cl10 Hamilton 2016-08-08

使用 dplyr,

       # parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = if_else(OnsetDate >= as.Date('2015-04-01'),    # condition
                             'NewOnset',    # return if above (true)
                             'OldOnset'))   # return if below (false)

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>    <chr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset

请注意,此处的分组不会 执行 任何操作,您可以在 mutate 中执行这两项操作,但您确实得到了一个适当的分组 data.frame进一步突变或总结。

另一种方法是使用 cut.Date,这将 return 一个因素:

# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>% 
    # add and group by new column
    group_by(group = cut(OnsetDate, 
                         breaks = c(min(OnsetDate), as.Date('2015-04-01'), max(OnsetDate)), 
                         labels = c('OldOnset', 'NewOnset'), 
                         include.lowest = TRUE))

## Source: local data frame [10 x 4]
## Groups: group [2]
## 
##    Client         City  OnsetDate    group
##    <fctr>       <fctr>     <date>   <fctr>
## 1     Cl1      Toronto 1980-11-04 OldOnset
## 2     Cl2      Toronto 2005-04-08 OldOnset
## 3     Cl3     Montreal 2015-04-19 NewOnset
## 4     Cl4       Ottawa 2015-07-10 NewOnset
## 5     Cl5       Ottawa 1999-10-10 OldOnset
## 6     Cl6     Hamilton 2016-03-11 NewOnset
## 7     Cl7 Peterborough 2011-09-12 OldOnset
## 8     Cl8      Toronto 2015-06-10 NewOnset
## 9     Cl9     Hamilton 1988-02-05 OldOnset
## 10   Cl10     Hamilton 2016-08-08 NewOnset