R:将日期变量聚合成年
R: aggregate date variable into years
我的数据框的一列中有很多不同的日期。我想汇总数据,以便只保留年份;我不需要几个月和几天。最初条目保存为 integer
。函数as.Date
returns废话
"0011-06-20"
而不是
"11-06-2000"
所以我使用了 as.character.Date
并得到了有效结果:
as.character.Date(Training_lowNA$last_swap)
[1] "11/6/2000 "
根据这些结果,我现在想删除日期和月份,只保留年份。或者对整数做同样的事情会更容易吗?
如果有有用的想法,我会很高兴!
编辑:我的输入数据有 50,000 个格式为
的日期条目
[9955] 8/14/2001 5/27/2001 3/16/2001 4/13/2000
[9961] 7/1/2000 5/18/2000 8/6/2001 7/17/2000 9/16/2001
[9967] 10/21/2000 7/24/2001 5/6/2000 12/18/2000
[9973] 1/11/2001 7/31/2001 9/17/2001 3/8/2001
[9979] 9/30/2000 7/12/2001 8/20/2000
[9985] 10/20/2000 9/21/2000 9/27/2000 7/18/2000
[9991] 10/1/2000
[9997] 9/17/2001 7/22/2001 11/6/2000 5/31/2001
[ reached getOption("max.print") -- omitted 40000 entries ]
我想要的输出是:
[9955] 2001 2001 2001 2000
[9961] 2000 2000 2001 2000 2001
[9967] 2000 2001 2000 2000
[9973] 2001 2001 2001 2001
[9979] 2000 2001 2000
[9985] 2000 2000 2000 2000
[9991] 2000
[9997] 2001 2001 2000 2001
编辑 #2
正如大卫在下面建议的那样,我尝试了他的方法:
Training_lowNA[] <- lapply(Training_lowNA, function(x) format(as.Date(x, "%m/%d/%Y"), "%Y")).
调试显示:
function (x)
{
xx <- x[1L]
if (is.na(xx)) {
j <- 1L
while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
if (is.na(xx))
f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", tz = "GMT")) ||
!is.na(strptime(xx, f <- "%Y/%m/%d", tz = "GMT")))
return(strptime(x, f))
stop("character string is not in a standard unambiguous format")
编辑 #3 来了:
> dput(head(Training_lowNA$last_swap))
structure(c(78L, 32L, 1100L, 1019L, 522L, 265L), .Label = c("",
"1/1/2000", "1/1/2001", "1/1/2002", "1/10/1999", "1/10/2000",
"here follow 50,000 entries of this sort", "9/9/2000", "9/9/2001"
), class = "factor")
尝试使用 lubridate 包中的 year() 函数。
参考这个link
首先,您需要从字符串中创建正确的日期对象:
(a <- as.Date("9/21/2000", "%m/%d/%Y"))
## [1] "2000-09-21"
然后你可以提取年份:
format(a, "%Y")
## [1] "2000"
如果你有带日期的向量,它会合并成一行:
format(as.Date(df$date, "%m/%d/%Y"), "%Y")
以下方法可以做到:
dat <- c("8/14/2001", "5/27/2001", "3/16/2001", "4/13/2000", "7/1/2000", "5/18/2000", "8/6/2001", "7/17/2000", "9/16/2001", "10/21/2000", "7/24/2001", "7/24/1977", "7/24/1999")
ndat <- as.POSIXlt(dat, format="%m/%d/%Y")
as.POSIXlt(ndat)$year + 1900
我的数据框的一列中有很多不同的日期。我想汇总数据,以便只保留年份;我不需要几个月和几天。最初条目保存为 integer
。函数as.Date
returns废话
"0011-06-20"
而不是
"11-06-2000"
所以我使用了 as.character.Date
并得到了有效结果:
as.character.Date(Training_lowNA$last_swap)
[1] "11/6/2000 "
根据这些结果,我现在想删除日期和月份,只保留年份。或者对整数做同样的事情会更容易吗?
如果有有用的想法,我会很高兴!
编辑:我的输入数据有 50,000 个格式为
的日期条目[9955] 8/14/2001 5/27/2001 3/16/2001 4/13/2000
[9961] 7/1/2000 5/18/2000 8/6/2001 7/17/2000 9/16/2001
[9967] 10/21/2000 7/24/2001 5/6/2000 12/18/2000
[9973] 1/11/2001 7/31/2001 9/17/2001 3/8/2001
[9979] 9/30/2000 7/12/2001 8/20/2000
[9985] 10/20/2000 9/21/2000 9/27/2000 7/18/2000
[9991] 10/1/2000
[9997] 9/17/2001 7/22/2001 11/6/2000 5/31/2001
[ reached getOption("max.print") -- omitted 40000 entries ]
我想要的输出是:
[9955] 2001 2001 2001 2000
[9961] 2000 2000 2001 2000 2001
[9967] 2000 2001 2000 2000
[9973] 2001 2001 2001 2001
[9979] 2000 2001 2000
[9985] 2000 2000 2000 2000
[9991] 2000
[9997] 2001 2001 2000 2001
编辑 #2
正如大卫在下面建议的那样,我尝试了他的方法:
Training_lowNA[] <- lapply(Training_lowNA, function(x) format(as.Date(x, "%m/%d/%Y"), "%Y")).
调试显示:
function (x)
{
xx <- x[1L]
if (is.na(xx)) {
j <- 1L
while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
if (is.na(xx))
f <- "%Y-%m-%d"
}
if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", tz = "GMT")) ||
!is.na(strptime(xx, f <- "%Y/%m/%d", tz = "GMT")))
return(strptime(x, f))
stop("character string is not in a standard unambiguous format")
编辑 #3 来了:
> dput(head(Training_lowNA$last_swap))
structure(c(78L, 32L, 1100L, 1019L, 522L, 265L), .Label = c("",
"1/1/2000", "1/1/2001", "1/1/2002", "1/10/1999", "1/10/2000",
"here follow 50,000 entries of this sort", "9/9/2000", "9/9/2001"
), class = "factor")
尝试使用 lubridate 包中的 year() 函数。
参考这个link
首先,您需要从字符串中创建正确的日期对象:
(a <- as.Date("9/21/2000", "%m/%d/%Y"))
## [1] "2000-09-21"
然后你可以提取年份:
format(a, "%Y")
## [1] "2000"
如果你有带日期的向量,它会合并成一行:
format(as.Date(df$date, "%m/%d/%Y"), "%Y")
以下方法可以做到:
dat <- c("8/14/2001", "5/27/2001", "3/16/2001", "4/13/2000", "7/1/2000", "5/18/2000", "8/6/2001", "7/17/2000", "9/16/2001", "10/21/2000", "7/24/2001", "7/24/1977", "7/24/1999")
ndat <- as.POSIXlt(dat, format="%m/%d/%Y")
as.POSIXlt(ndat)$year + 1900