从数据框中的行中提取最近的日期
Extracting most recent date from rows in a data frame
我正在使用一个数据框,该数据框具有多个相互关联的日期,但最重要的是我需要提取最近的日期。我在这里看到了例子,但没有什么是我正在寻找的。所以我的示例数据框如下:
ID date1 date2 date3
1 01/12/15 02/04/07 07/06/16
2 03/29/12 02/16/16 09/01/10
3 12/01/15 07/07/07 11/13/12
但我想要的是得到一个输出给我:
ID date1 date2 date3 max
1 01/12/15 02/04/07 07/06/16 07/06/16
2 03/29/12 02/16/16 09/01/10 02/16/16
3 12/01/15 07/07/07 11/13/12 12/01/15
我看到人们使用 plyr 和 dplyr,但我对这些软件包非常不熟悉。感谢您的帮助!
编辑:我能够 运行 @ak运行 给出的内容,但我 运行 进入空字段日期的问题。我提供了一个例子如下:
ID date1 date2 date3
1 01/12/15 NA 07/06/16
2 NA 02/16/16 09/01/10
3 12/01/15 07/07/07 NA
因此,对于那些空白点,我仍然希望数据框 t运行sformed 如下:
ID date1 date2 date3 max
1 01/12/15 NA 07/06/16 07/06/16
2 NA 02/16/16 09/01/10 02/16/16
3 12/01/15 07/07/07 NA 12/01/15
我们可以转换为Date
class然后用max.col
得到列索引,cbind
用行索引,从[=25中提取元素=] 并创建 'max' 列。
df1$max <- df1[cbind(1:nrow(df1), max.col(sapply(df1[-1], as.Date, format = "%m/%d/%y"))+1)]
df1
# ID date1 date2 date3 max
#1 1 01/12/15 02/04/07 07/06/16 07/06/16
#2 2 03/29/12 02/16/16 09/01/10 02/16/16
#3 3 12/01/15 07/07/07 11/13/12 12/01/15
或者另一种选择是 apply
和 MARGIN = 1
df1$max <- apply(df1[-1], 1, function(x) x[which.max(as.Date(x, "%m/%d/%y"))])
数据
df1 <- structure(list(ID = 1:3, date1 = c("01/12/15", "03/29/12", "12/01/15"
), date2 = c("02/04/07", "02/16/16", "07/07/07"), date3 = c("07/06/16",
"09/01/10", "11/13/12")), .Names = c("ID", "date1", "date2",
"date3"), class = "data.frame", row.names = c("1", "2", "3"))
转换为Date
对象后使用pmax
:
dat[-1] <- lapply(dat[-1], as.Date, format="%m/%d/%y")
dat$max <- do.call(pmax, dat[-1])
# ID date1 date2 date3 max
#1 1 2015-01-12 2007-02-04 2016-07-06 2016-07-06
#2 2 2012-03-29 2016-02-16 2010-09-01 2016-02-16
#3 3 2015-12-01 2007-07-07 2012-11-13 2015-12-01
将dat
用作:
dat <- structure(list(ID = 1:3, date1 = structure(1:3, .Label = c("01/12/15",
"03/29/12", "12/01/15"), class = "factor"), date2 = structure(1:3, .Label = c("02/04/07",
"02/16/16", "07/07/07"), class = "factor"), date3 = structure(1:3, .Label = c("07/06/16",
"09/01/10", "11/13/12"), class = "factor")), .Names = c("ID",
"date1", "date2", "date3"), class = "data.frame", row.names = c("1",
"2", "3"))
如果您觉得使用 SQL 更舒服,库 sqldf
为您提供了另一种获取最后日期的方法:
data1<-data.frame(id=c("1","2","3"),
date1=as.Date(c("01/12/15","03/29/12","12/01/15"),"%m/%d/%y"),
date2=as.Date(c("02/04/07","02/16/16","07/07/07"),"%m/%d/%y"),
date3=as.Date(c("07/06/16","09/01/10","11/13/12"),"%m/%d/%y"))
library(sqldf)
data2 = sqldf("SELECT id,
max(date1,date2,date3) as 'max__Date'
FROM data1", method = "name__class")
我正在使用一个数据框,该数据框具有多个相互关联的日期,但最重要的是我需要提取最近的日期。我在这里看到了例子,但没有什么是我正在寻找的。所以我的示例数据框如下:
ID date1 date2 date3
1 01/12/15 02/04/07 07/06/16
2 03/29/12 02/16/16 09/01/10
3 12/01/15 07/07/07 11/13/12
但我想要的是得到一个输出给我:
ID date1 date2 date3 max
1 01/12/15 02/04/07 07/06/16 07/06/16
2 03/29/12 02/16/16 09/01/10 02/16/16
3 12/01/15 07/07/07 11/13/12 12/01/15
我看到人们使用 plyr 和 dplyr,但我对这些软件包非常不熟悉。感谢您的帮助!
编辑:我能够 运行 @ak运行 给出的内容,但我 运行 进入空字段日期的问题。我提供了一个例子如下:
ID date1 date2 date3
1 01/12/15 NA 07/06/16
2 NA 02/16/16 09/01/10
3 12/01/15 07/07/07 NA
因此,对于那些空白点,我仍然希望数据框 t运行sformed 如下:
ID date1 date2 date3 max
1 01/12/15 NA 07/06/16 07/06/16
2 NA 02/16/16 09/01/10 02/16/16
3 12/01/15 07/07/07 NA 12/01/15
我们可以转换为Date
class然后用max.col
得到列索引,cbind
用行索引,从[=25中提取元素=] 并创建 'max' 列。
df1$max <- df1[cbind(1:nrow(df1), max.col(sapply(df1[-1], as.Date, format = "%m/%d/%y"))+1)]
df1
# ID date1 date2 date3 max
#1 1 01/12/15 02/04/07 07/06/16 07/06/16
#2 2 03/29/12 02/16/16 09/01/10 02/16/16
#3 3 12/01/15 07/07/07 11/13/12 12/01/15
或者另一种选择是 apply
和 MARGIN = 1
df1$max <- apply(df1[-1], 1, function(x) x[which.max(as.Date(x, "%m/%d/%y"))])
数据
df1 <- structure(list(ID = 1:3, date1 = c("01/12/15", "03/29/12", "12/01/15"
), date2 = c("02/04/07", "02/16/16", "07/07/07"), date3 = c("07/06/16",
"09/01/10", "11/13/12")), .Names = c("ID", "date1", "date2",
"date3"), class = "data.frame", row.names = c("1", "2", "3"))
转换为Date
对象后使用pmax
:
dat[-1] <- lapply(dat[-1], as.Date, format="%m/%d/%y")
dat$max <- do.call(pmax, dat[-1])
# ID date1 date2 date3 max
#1 1 2015-01-12 2007-02-04 2016-07-06 2016-07-06
#2 2 2012-03-29 2016-02-16 2010-09-01 2016-02-16
#3 3 2015-12-01 2007-07-07 2012-11-13 2015-12-01
将dat
用作:
dat <- structure(list(ID = 1:3, date1 = structure(1:3, .Label = c("01/12/15",
"03/29/12", "12/01/15"), class = "factor"), date2 = structure(1:3, .Label = c("02/04/07",
"02/16/16", "07/07/07"), class = "factor"), date3 = structure(1:3, .Label = c("07/06/16",
"09/01/10", "11/13/12"), class = "factor")), .Names = c("ID",
"date1", "date2", "date3"), class = "data.frame", row.names = c("1",
"2", "3"))
如果您觉得使用 SQL 更舒服,库 sqldf
为您提供了另一种获取最后日期的方法:
data1<-data.frame(id=c("1","2","3"),
date1=as.Date(c("01/12/15","03/29/12","12/01/15"),"%m/%d/%y"),
date2=as.Date(c("02/04/07","02/16/16","07/07/07"),"%m/%d/%y"),
date3=as.Date(c("07/06/16","09/01/10","11/13/12"),"%m/%d/%y"))
library(sqldf)
data2 = sqldf("SELECT id,
max(date1,date2,date3) as 'max__Date'
FROM data1", method = "name__class")