使用 Lubridate 和 Dplyr 基于时间段创建子组
Creating Subgroups based on Time Period using Lubridate and Dplyr
这应该是一个快速简单的问题。使用下面的简单数据框,我想使用 dplyr 和 lubridate 将 OnsetDate 在 2015 年 4 月或之后的所有客户分组。这个组将被称为 "NewOnset",其余的将是 "OldOnset"。
我是润滑新手,遇到了一些麻烦。
City<-c("Toronto", "Toronto", "Montreal","Ottawa","Ottawa",
"Hamilton","Peterborough","Toronto","Hamilton","Hamilton")
OnsetDate<-c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
Client<-c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")
DF<- data.frame(Client,City,OnsetDate)
不需要使用外部包来完成这个简单的任务。在基数 R 中:
## coerce character to a valid date
DF$OnsetDate <- as.Date(DF$OnsetDate ,"%m/%d/%Y")
## flter rows
DF[DF$OnsetDate>"2015-04-30",]
# Client City OnsetDate
# 4 Cl4 Ottawa 2015-07-10
# 6 Cl6 Hamilton 2016-03-11
# 8 Cl8 Toronto 2015-06-10
# 10 Cl10 Hamilton 2016-08-08
您可以在没有 dplyr 功能的情况下执行此操作。 Lubridate 的函数系列以您要转换为日期的对象的格式命名。在这种情况下,您要使用 mdy
函数,因为输入格式是月-日-年。
DF$OnsetDate <- mdy(DF$OnsetDate)
然后您可以通过根据您的条件对行进行子集化来创建新的数据框。
NewOnset <- DF[DF$OnsetDate >= as.Date("2015-04-01"), ]
OldOnset <- DF[DF$OnsetDate < as.Date("2015-04-01"), ]
您的代码有几个问题。这应该可以解决它:
City <- c("Toronto", "Toronto", "Montreal", "Ottawa", "Ottawa", "Hamilton", "Peterborough", "Toronto", "Hamilton", "Hamilton")
OnsetDate <- c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
Client <- c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")
df <- data.frame(Client, City, OnsetDate)
df$OnsetDate <- as.Date(df$OnsetDate, format = "%m/%d/%Y")
# here comes the magic
df %>% filter(OnsetDate > as.Date("04/01/2015", format = "%m/%d/%Y"))
你可以使用format
参数,这里不需要lubridate
包。上面的代码产生:
Client City OnsetDate
1 Cl3 Montreal 2015-04-19
2 Cl4 Ottawa 2015-07-10
3 Cl6 Hamilton 2016-03-11
4 Cl8 Toronto 2015-06-10
5 Cl10 Hamilton 2016-08-08
使用 dplyr,
# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>%
# add and group by new column
group_by(group = if_else(OnsetDate >= as.Date('2015-04-01'), # condition
'NewOnset', # return if above (true)
'OldOnset')) # return if below (false)
## Source: local data frame [10 x 4]
## Groups: group [2]
##
## Client City OnsetDate group
## <fctr> <fctr> <date> <chr>
## 1 Cl1 Toronto 1980-11-04 OldOnset
## 2 Cl2 Toronto 2005-04-08 OldOnset
## 3 Cl3 Montreal 2015-04-19 NewOnset
## 4 Cl4 Ottawa 2015-07-10 NewOnset
## 5 Cl5 Ottawa 1999-10-10 OldOnset
## 6 Cl6 Hamilton 2016-03-11 NewOnset
## 7 Cl7 Peterborough 2011-09-12 OldOnset
## 8 Cl8 Toronto 2015-06-10 NewOnset
## 9 Cl9 Hamilton 1988-02-05 OldOnset
## 10 Cl10 Hamilton 2016-08-08 NewOnset
请注意,此处的分组不会 执行 任何操作,您可以在 mutate
中执行这两项操作,但您确实得到了一个适当的分组 data.frame进一步突变或总结。
另一种方法是使用 cut.Date
,这将 return 一个因素:
# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>%
# add and group by new column
group_by(group = cut(OnsetDate,
breaks = c(min(OnsetDate), as.Date('2015-04-01'), max(OnsetDate)),
labels = c('OldOnset', 'NewOnset'),
include.lowest = TRUE))
## Source: local data frame [10 x 4]
## Groups: group [2]
##
## Client City OnsetDate group
## <fctr> <fctr> <date> <fctr>
## 1 Cl1 Toronto 1980-11-04 OldOnset
## 2 Cl2 Toronto 2005-04-08 OldOnset
## 3 Cl3 Montreal 2015-04-19 NewOnset
## 4 Cl4 Ottawa 2015-07-10 NewOnset
## 5 Cl5 Ottawa 1999-10-10 OldOnset
## 6 Cl6 Hamilton 2016-03-11 NewOnset
## 7 Cl7 Peterborough 2011-09-12 OldOnset
## 8 Cl8 Toronto 2015-06-10 NewOnset
## 9 Cl9 Hamilton 1988-02-05 OldOnset
## 10 Cl10 Hamilton 2016-08-08 NewOnset
这应该是一个快速简单的问题。使用下面的简单数据框,我想使用 dplyr 和 lubridate 将 OnsetDate 在 2015 年 4 月或之后的所有客户分组。这个组将被称为 "NewOnset",其余的将是 "OldOnset"。
我是润滑新手,遇到了一些麻烦。
City<-c("Toronto", "Toronto", "Montreal","Ottawa","Ottawa",
"Hamilton","Peterborough","Toronto","Hamilton","Hamilton")
OnsetDate<-c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
Client<-c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")
DF<- data.frame(Client,City,OnsetDate)
不需要使用外部包来完成这个简单的任务。在基数 R 中:
## coerce character to a valid date
DF$OnsetDate <- as.Date(DF$OnsetDate ,"%m/%d/%Y")
## flter rows
DF[DF$OnsetDate>"2015-04-30",]
# Client City OnsetDate
# 4 Cl4 Ottawa 2015-07-10
# 6 Cl6 Hamilton 2016-03-11
# 8 Cl8 Toronto 2015-06-10
# 10 Cl10 Hamilton 2016-08-08
您可以在没有 dplyr 功能的情况下执行此操作。 Lubridate 的函数系列以您要转换为日期的对象的格式命名。在这种情况下,您要使用 mdy
函数,因为输入格式是月-日-年。
DF$OnsetDate <- mdy(DF$OnsetDate)
然后您可以通过根据您的条件对行进行子集化来创建新的数据框。
NewOnset <- DF[DF$OnsetDate >= as.Date("2015-04-01"), ]
OldOnset <- DF[DF$OnsetDate < as.Date("2015-04-01"), ]
您的代码有几个问题。这应该可以解决它:
City <- c("Toronto", "Toronto", "Montreal", "Ottawa", "Ottawa", "Hamilton", "Peterborough", "Toronto", "Hamilton", "Hamilton")
OnsetDate <- c("11/04/1980","04/08/2005","04/19/2015","07/10/2015","10/10/1999","03/11/2016","09/12/2011","06/10/2015","02/05/1988","08/08/2016")
Client <- c("Cl1","Cl2","Cl3","Cl4","Cl5","Cl6","Cl7","Cl8","Cl9","Cl10")
df <- data.frame(Client, City, OnsetDate)
df$OnsetDate <- as.Date(df$OnsetDate, format = "%m/%d/%Y")
# here comes the magic
df %>% filter(OnsetDate > as.Date("04/01/2015", format = "%m/%d/%Y"))
你可以使用format
参数,这里不需要lubridate
包。上面的代码产生:
Client City OnsetDate
1 Cl3 Montreal 2015-04-19
2 Cl4 Ottawa 2015-07-10
3 Cl6 Hamilton 2016-03-11
4 Cl8 Toronto 2015-06-10
5 Cl10 Hamilton 2016-08-08
使用 dplyr,
# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>%
# add and group by new column
group_by(group = if_else(OnsetDate >= as.Date('2015-04-01'), # condition
'NewOnset', # return if above (true)
'OldOnset')) # return if below (false)
## Source: local data frame [10 x 4]
## Groups: group [2]
##
## Client City OnsetDate group
## <fctr> <fctr> <date> <chr>
## 1 Cl1 Toronto 1980-11-04 OldOnset
## 2 Cl2 Toronto 2005-04-08 OldOnset
## 3 Cl3 Montreal 2015-04-19 NewOnset
## 4 Cl4 Ottawa 2015-07-10 NewOnset
## 5 Cl5 Ottawa 1999-10-10 OldOnset
## 6 Cl6 Hamilton 2016-03-11 NewOnset
## 7 Cl7 Peterborough 2011-09-12 OldOnset
## 8 Cl8 Toronto 2015-06-10 NewOnset
## 9 Cl9 Hamilton 1988-02-05 OldOnset
## 10 Cl10 Hamilton 2016-08-08 NewOnset
请注意,此处的分组不会 执行 任何操作,您可以在 mutate
中执行这两项操作,但您确实得到了一个适当的分组 data.frame进一步突变或总结。
另一种方法是使用 cut.Date
,这将 return 一个因素:
# parse OnsetDate to Date; alternatively use lubridate::mdy(OnsetDate)
DF %>% mutate(OnsetDate = as.Date(OnsetDate, '%m/%d/%Y')) %>%
# add and group by new column
group_by(group = cut(OnsetDate,
breaks = c(min(OnsetDate), as.Date('2015-04-01'), max(OnsetDate)),
labels = c('OldOnset', 'NewOnset'),
include.lowest = TRUE))
## Source: local data frame [10 x 4]
## Groups: group [2]
##
## Client City OnsetDate group
## <fctr> <fctr> <date> <fctr>
## 1 Cl1 Toronto 1980-11-04 OldOnset
## 2 Cl2 Toronto 2005-04-08 OldOnset
## 3 Cl3 Montreal 2015-04-19 NewOnset
## 4 Cl4 Ottawa 2015-07-10 NewOnset
## 5 Cl5 Ottawa 1999-10-10 OldOnset
## 6 Cl6 Hamilton 2016-03-11 NewOnset
## 7 Cl7 Peterborough 2011-09-12 OldOnset
## 8 Cl8 Toronto 2015-06-10 NewOnset
## 9 Cl9 Hamilton 1988-02-05 OldOnset
## 10 Cl10 Hamilton 2016-08-08 NewOnset