数据重塑和分组
Data reshape and grouping
我是 R 的新手,请帮忙。
我有一个包含 5 列的数据框,名称为 Seasondate 和 V1、V2、V3、V4。
季节日期有不同的日期格式,大约有 1000 个观察结果,例如:
January to March
August to October
05/01/2013 to 10/30/2013
NA
February to June
02/15/2013 to 06/19/2013
我想将它们全部整合成一种格式。就像将它们以月到月的一种格式全部整合起来。
非常感谢使用字符串函数进行解析
编辑 1:
他们都是2013年
谢谢
使用 as.Date
和 format
来回进行一些格式化,然后再次 paste
将其全部合并:
datext <- function(x) {
dates <- as.Date(x,format="%m/%d/%Y")
if(all(is.na(dates))) x else format(dates,"%B")
}
vapply(
lapply(strsplit(as.character(dat$Seasondate), " to "), datext),
paste, collapse=" to ", FUN.VALUE=character(1)
)
#[1] "January to March" "August to October" "May to October"
#[4] "NA" "February to June" "February to June"
这是另一个不使用日期强制的想法,而是使用来自基础 R 的 month.name
向量。
## change the column to character
df$V1 <- as.character(df$V1)
## find the numeric values
g <- grepl("\d", df$V1)
## split them, get the months, then apply 'month.name' and paste
df$V1[g] <- vapply(strsplit(df$V1[g], " to "), function(x) {
paste(month.name[as.integer(sub("/.*", "", x))], collapse = " to ")
}, "")
导致
df
V1
1 January to March
2 August to October
3 May to October
4 <NA>
5 February to June
6 February to June
原始数据:
df <- structure(list(V1 = structure(c(5L, 3L, 2L, NA, 4L, 1L), .Label = c("02/15/2013 to 06/19/2013",
"05/01/2013 to 10/30/2013", "August to October", "February to June",
"January to March"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))
我是 R 的新手,请帮忙。 我有一个包含 5 列的数据框,名称为 Seasondate 和 V1、V2、V3、V4。 季节日期有不同的日期格式,大约有 1000 个观察结果,例如:
January to March
August to October
05/01/2013 to 10/30/2013
NA
February to June
02/15/2013 to 06/19/2013
我想将它们全部整合成一种格式。就像将它们以月到月的一种格式全部整合起来。
非常感谢使用字符串函数进行解析
编辑 1:
他们都是2013年 谢谢
使用 as.Date
和 format
来回进行一些格式化,然后再次 paste
将其全部合并:
datext <- function(x) {
dates <- as.Date(x,format="%m/%d/%Y")
if(all(is.na(dates))) x else format(dates,"%B")
}
vapply(
lapply(strsplit(as.character(dat$Seasondate), " to "), datext),
paste, collapse=" to ", FUN.VALUE=character(1)
)
#[1] "January to March" "August to October" "May to October"
#[4] "NA" "February to June" "February to June"
这是另一个不使用日期强制的想法,而是使用来自基础 R 的 month.name
向量。
## change the column to character
df$V1 <- as.character(df$V1)
## find the numeric values
g <- grepl("\d", df$V1)
## split them, get the months, then apply 'month.name' and paste
df$V1[g] <- vapply(strsplit(df$V1[g], " to "), function(x) {
paste(month.name[as.integer(sub("/.*", "", x))], collapse = " to ")
}, "")
导致
df
V1
1 January to March
2 August to October
3 May to October
4 <NA>
5 February to June
6 February to June
原始数据:
df <- structure(list(V1 = structure(c(5L, 3L, 2L, NA, 4L, 1L), .Label = c("02/15/2013 to 06/19/2013",
"05/01/2013 to 10/30/2013", "August to October", "February to June",
"January to March"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA,
-6L))