strptime 范围并制作一个日期列
strptime range and make a date column
我有以下形式的日期
Date Value
<chr> <dbl>
[2014-1-24 - 2014-2-2] 1.1
[2014-2-3 - 2014-3-2] 2.2
. .
. .
. .
这种情况持续了很多年。我想将其转换为长格式,如下所示
Date Value
<date> <dbl>
2014-01-24 1.1
2014-01-25 1.1
2014-01-26 1.1
2014-01-27 1.1
2014-01-28 1.1
2014-01-29 1.1
2014-01-30 1.1
2014-01-31 1.1
2014-02-01 1.1
2014-02-02 1.1
2014-02-03 2.2
2014-02-04 2.2
2014-02-05 2.2
. .
. .
. .
完成此任务的简洁方法是什么?
使用dplyr
和tidyr
:
library(dplyr); library(tidyr);
df %>%
mutate(Date = str_match_all(Date, '\d{4}-\d{1,2}-\d{1,2}'),
Date = lapply(Date, function(d) seq(as.Date(d[1]), as.Date(d[2]), by='day'))) %>%
unnest()
# Value Date
#1 1.1 2014-01-24
#2 1.1 2014-01-25
#3 1.1 2014-01-26
#4 1.1 2014-01-27
#5 1.1 2014-01-28
#6 1.1 2014-01-29
#7 1.1 2014-01-30
#8 1.1 2014-01-31
#9 1.1 2014-02-01
#10 1.1 2014-02-02
#11 2.2 2014-02-03
#12 2.2 2014-02-04
# ...
使用purrr
:
library(stringr); library(purrr)
# extract the start and end date from Date string
df$Date <- map(str_match_all(df$Date, '\d{4}-\d{1,2}-\d{1,2}'), as.Date)
# map over rows and expand the date from range to Sequence using seq.Date
pmap_df(df, ~ data_frame(Date = seq(.x[1], .x[2], by='day'), Value = .y))
# A tibble: 38 x 2
# Date Value
# <date> <dbl>
# 1 2014-01-24 1.1
# 2 2014-01-25 1.1
# 3 2014-01-26 1.1
# 4 2014-01-27 1.1
# 5 2014-01-28 1.1
# 6 2014-01-29 1.1
# 7 2014-01-30 1.1
# 8 2014-01-31 1.1
# 9 2014-02-01 1.1
#10 2014-02-02 1.1
# ... with 28 more rows
这是一个使用 data.table
和 lubridate
的选项。按 'Value' 分组(假设它是唯一的 - 如果不使用行的顺序),将 'Date' 分成两列 tstrsplit
,将其转换为 Date
class 与 ymd
(来自 lubridate
),并使用 Reduce
获取日期序列
library(data.table)
library(lubridate)
setDT(df1)[, .(Date = Reduce(function(...) seq(..., by = '1 day'),
lapply(tstrsplit(Date, "\s-\s"), ymd))), Value][, .(Date, Value)]
# Date Value
# 1: 2014-01-24 1.1
# 2: 2014-01-25 1.1
# 3: 2014-01-26 1.1
# 4: 2014-01-27 1.1
# 5: 2014-01-28 1.1
# 6: 2014-01-29 1.1
# 7: 2014-01-30 1.1
# 8: 2014-01-31 1.1
# 9: 2014-02-01 1.1
#10: 2014-02-02 1.1
#11: 2014-02-03 2.2
#12: 2014-02-04 2.2
#13: 2014-02-05 2.2
#14: 2014-02-06 2.2
# - -
# - -
我有以下形式的日期
Date Value
<chr> <dbl>
[2014-1-24 - 2014-2-2] 1.1
[2014-2-3 - 2014-3-2] 2.2
. .
. .
. .
这种情况持续了很多年。我想将其转换为长格式,如下所示
Date Value
<date> <dbl>
2014-01-24 1.1
2014-01-25 1.1
2014-01-26 1.1
2014-01-27 1.1
2014-01-28 1.1
2014-01-29 1.1
2014-01-30 1.1
2014-01-31 1.1
2014-02-01 1.1
2014-02-02 1.1
2014-02-03 2.2
2014-02-04 2.2
2014-02-05 2.2
. .
. .
. .
完成此任务的简洁方法是什么?
使用dplyr
和tidyr
:
library(dplyr); library(tidyr);
df %>%
mutate(Date = str_match_all(Date, '\d{4}-\d{1,2}-\d{1,2}'),
Date = lapply(Date, function(d) seq(as.Date(d[1]), as.Date(d[2]), by='day'))) %>%
unnest()
# Value Date
#1 1.1 2014-01-24
#2 1.1 2014-01-25
#3 1.1 2014-01-26
#4 1.1 2014-01-27
#5 1.1 2014-01-28
#6 1.1 2014-01-29
#7 1.1 2014-01-30
#8 1.1 2014-01-31
#9 1.1 2014-02-01
#10 1.1 2014-02-02
#11 2.2 2014-02-03
#12 2.2 2014-02-04
# ...
使用purrr
:
library(stringr); library(purrr)
# extract the start and end date from Date string
df$Date <- map(str_match_all(df$Date, '\d{4}-\d{1,2}-\d{1,2}'), as.Date)
# map over rows and expand the date from range to Sequence using seq.Date
pmap_df(df, ~ data_frame(Date = seq(.x[1], .x[2], by='day'), Value = .y))
# A tibble: 38 x 2
# Date Value
# <date> <dbl>
# 1 2014-01-24 1.1
# 2 2014-01-25 1.1
# 3 2014-01-26 1.1
# 4 2014-01-27 1.1
# 5 2014-01-28 1.1
# 6 2014-01-29 1.1
# 7 2014-01-30 1.1
# 8 2014-01-31 1.1
# 9 2014-02-01 1.1
#10 2014-02-02 1.1
# ... with 28 more rows
这是一个使用 data.table
和 lubridate
的选项。按 'Value' 分组(假设它是唯一的 - 如果不使用行的顺序),将 'Date' 分成两列 tstrsplit
,将其转换为 Date
class 与 ymd
(来自 lubridate
),并使用 Reduce
library(data.table)
library(lubridate)
setDT(df1)[, .(Date = Reduce(function(...) seq(..., by = '1 day'),
lapply(tstrsplit(Date, "\s-\s"), ymd))), Value][, .(Date, Value)]
# Date Value
# 1: 2014-01-24 1.1
# 2: 2014-01-25 1.1
# 3: 2014-01-26 1.1
# 4: 2014-01-27 1.1
# 5: 2014-01-28 1.1
# 6: 2014-01-29 1.1
# 7: 2014-01-30 1.1
# 8: 2014-01-31 1.1
# 9: 2014-02-01 1.1
#10: 2014-02-02 1.1
#11: 2014-02-03 2.2
#12: 2014-02-04 2.2
#13: 2014-02-05 2.2
#14: 2014-02-06 2.2
# - -
# - -