Unstack lubridate 的时间间隔 class
Unstack lubridate's interval class
我正在尝试转换一个数据框 df
,该数据框由一个 value
列、两个日期列(start
和 end
)和一个间隔列组成(duration
) 通过 unnesting/unstacking duration
列转换为长格式。
library(dplyr)
library(lubridate)
df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>%
mutate(duration = interval(start, end))
预期结果将是一个数据框,其中 value
、start
和 end
根据 duration
定义的每一天进行复制。例如,值 'a' 会在不同的一天(1970 年 1 月 2 日、3 日、4 日、5 日、6 日、7 日)每次出现 6 次。
我尝试使用 tidyr
包中的 unnest
函数,但没有任何反应。
tidyr::unnest(df, duration)
非常感谢任何帮助:)
我认为 interval
没有帮助 - seq.Date
可能更好...
library(purrr) #as well as those you have
df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>%
mutate(day = map2(start, end, seq.Date, by = "day")) %>%
unnest(day)
df
# A tibble: 12 x 4
value start end day
<chr> <date> <date> <date>
1 a 1970-01-02 1970-01-07 1970-01-02
2 a 1970-01-02 1970-01-07 1970-01-03
3 a 1970-01-02 1970-01-07 1970-01-04
4 a 1970-01-02 1970-01-07 1970-01-05
5 a 1970-01-02 1970-01-07 1970-01-06
6 a 1970-01-02 1970-01-07 1970-01-07
7 b 1970-01-03 1970-01-06 1970-01-03
8 b 1970-01-03 1970-01-06 1970-01-04
9 b 1970-01-03 1970-01-06 1970-01-05
10 b 1970-01-03 1970-01-06 1970-01-06
11 c 1970-01-04 1970-01-05 1970-01-04
12 c 1970-01-04 1970-01-05 1970-01-05
要从间隔中提取开始和结束日期,您可以使用 int_start
和 int_end
,使用 map2
和 unnest
创建一个日期序列。
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
df %>%
mutate(date = map2(int_start(duration), int_end(duration),
~seq(as.Date(.x), as.Date(.y), by = 'day'))) %>%
#This will also work but would return date of class POSIXct
#mutate(date = map2(int_start(duration), int_end(duration),seq,by = 'day')) %>%
unnest(date) %>%
select(-duration)
# value start end date
# <chr> <date> <date> <date>
# 1 a 1970-01-02 1970-01-07 1970-01-02
# 2 a 1970-01-02 1970-01-07 1970-01-03
# 3 a 1970-01-02 1970-01-07 1970-01-04
# 4 a 1970-01-02 1970-01-07 1970-01-05
# 5 a 1970-01-02 1970-01-07 1970-01-06
# 6 a 1970-01-02 1970-01-07 1970-01-07
# 7 b 1970-01-03 1970-01-06 1970-01-03
# 8 b 1970-01-03 1970-01-06 1970-01-04
# 9 b 1970-01-03 1970-01-06 1970-01-05
#10 b 1970-01-03 1970-01-06 1970-01-06
#11 c 1970-01-04 1970-01-05 1970-01-04
#12 c 1970-01-04 1970-01-05 1970-01-05
您也可以使用以下解决方案。因为我们要创建重复的行,所以我们可以将操作包装在一个列表中,然后使用 unnest_longer
。 purrr
包函数一直是我的首选,但你也可以使用它作为替代。
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
group_by(value) %>%
mutate(date = list(start + 0:(duration/ddays(1)))) %>%
unnest_longer(date) %>%
select(-duration)
# A tibble: 12 x 4
# Groups: value [3]
value start end date
<chr> <date> <date> <date>
1 a 1970-01-02 1970-01-07 1970-01-02
2 a 1970-01-02 1970-01-07 1970-01-03
3 a 1970-01-02 1970-01-07 1970-01-04
4 a 1970-01-02 1970-01-07 1970-01-05
5 a 1970-01-02 1970-01-07 1970-01-06
6 a 1970-01-02 1970-01-07 1970-01-07
7 b 1970-01-03 1970-01-06 1970-01-03
8 b 1970-01-03 1970-01-06 1970-01-04
9 b 1970-01-03 1970-01-06 1970-01-05
10 b 1970-01-03 1970-01-06 1970-01-06
11 c 1970-01-04 1970-01-05 1970-01-04
12 c 1970-01-04 1970-01-05 1970-01-05
您不能拆开一列间隔并期望它生成其间的所有日期,但通过使用 seq
您可以自己生成它们。试试这个:
library(tidyverse)
library(lubridate)
df %>%
rowwise() %>%
summarise(
value, dates = seq(start, end, by = 1)
)
#> # A tibble: 12 x 2
#> value dates
#> <chr> <date>
#> 1 a 1970-01-02
#> 2 a 1970-01-03
#> 3 a 1970-01-04
#> 4 a 1970-01-05
#> 5 a 1970-01-06
#> 6 a 1970-01-07
#> 7 b 1970-01-03
#> 8 b 1970-01-04
#> 9 b 1970-01-05
#> 10 b 1970-01-06
#> 11 c 1970-01-04
#> 12 c 1970-01-05
由 reprex package (v1.0.0)
于 2021-05-18 创建
一种data.table
方法
library(data.table)
setDT(df)[, .(date = seq(start, end, by = 1)), by = .(value)]
# value date
# 1: a 1970-01-02
# 2: a 1970-01-03
# 3: a 1970-01-04
# 4: a 1970-01-05
# 5: a 1970-01-06
# 6: a 1970-01-07
# 7: b 1970-01-03
# 8: b 1970-01-04
# 9: b 1970-01-05
#10: b 1970-01-06
#11: c 1970-01-04
#12: c 1970-01-05
和uncount
df %>% uncount(as.integer(duration/(24*60*60)) +1) %>%
group_by(value) %>%
mutate(date = row_number() -1 + start)
# A tibble: 12 x 5
# Groups: value [3]
value start end duration date
<chr> <date> <date> <Interval> <date>
1 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-02
2 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-03
3 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-04
4 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-05
5 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-06
6 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-07
7 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-03
8 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-04
9 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-05
10 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-06
11 c 1970-01-04 1970-01-05 1970-01-04 UTC--1970-01-05 UTC 1970-01-04
12 c 1970-01-04 1970-01-05 1970-01-04 UTC--1970-01-05 UTC 1970-01-05
我正在尝试转换一个数据框 df
,该数据框由一个 value
列、两个日期列(start
和 end
)和一个间隔列组成(duration
) 通过 unnesting/unstacking duration
列转换为长格式。
library(dplyr)
library(lubridate)
df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>%
mutate(duration = interval(start, end))
预期结果将是一个数据框,其中 value
、start
和 end
根据 duration
定义的每一天进行复制。例如,值 'a' 会在不同的一天(1970 年 1 月 2 日、3 日、4 日、5 日、6 日、7 日)每次出现 6 次。
我尝试使用 tidyr
包中的 unnest
函数,但没有任何反应。
tidyr::unnest(df, duration)
非常感谢任何帮助:)
我认为 interval
没有帮助 - seq.Date
可能更好...
library(purrr) #as well as those you have
df <- data.frame(value = letters[1:3], start = as_date(1:3), end = as_date(3:1)+3) %>%
mutate(day = map2(start, end, seq.Date, by = "day")) %>%
unnest(day)
df
# A tibble: 12 x 4
value start end day
<chr> <date> <date> <date>
1 a 1970-01-02 1970-01-07 1970-01-02
2 a 1970-01-02 1970-01-07 1970-01-03
3 a 1970-01-02 1970-01-07 1970-01-04
4 a 1970-01-02 1970-01-07 1970-01-05
5 a 1970-01-02 1970-01-07 1970-01-06
6 a 1970-01-02 1970-01-07 1970-01-07
7 b 1970-01-03 1970-01-06 1970-01-03
8 b 1970-01-03 1970-01-06 1970-01-04
9 b 1970-01-03 1970-01-06 1970-01-05
10 b 1970-01-03 1970-01-06 1970-01-06
11 c 1970-01-04 1970-01-05 1970-01-04
12 c 1970-01-04 1970-01-05 1970-01-05
要从间隔中提取开始和结束日期,您可以使用 int_start
和 int_end
,使用 map2
和 unnest
创建一个日期序列。
library(dplyr)
library(purrr)
library(tidyr)
library(lubridate)
df %>%
mutate(date = map2(int_start(duration), int_end(duration),
~seq(as.Date(.x), as.Date(.y), by = 'day'))) %>%
#This will also work but would return date of class POSIXct
#mutate(date = map2(int_start(duration), int_end(duration),seq,by = 'day')) %>%
unnest(date) %>%
select(-duration)
# value start end date
# <chr> <date> <date> <date>
# 1 a 1970-01-02 1970-01-07 1970-01-02
# 2 a 1970-01-02 1970-01-07 1970-01-03
# 3 a 1970-01-02 1970-01-07 1970-01-04
# 4 a 1970-01-02 1970-01-07 1970-01-05
# 5 a 1970-01-02 1970-01-07 1970-01-06
# 6 a 1970-01-02 1970-01-07 1970-01-07
# 7 b 1970-01-03 1970-01-06 1970-01-03
# 8 b 1970-01-03 1970-01-06 1970-01-04
# 9 b 1970-01-03 1970-01-06 1970-01-05
#10 b 1970-01-03 1970-01-06 1970-01-06
#11 c 1970-01-04 1970-01-05 1970-01-04
#12 c 1970-01-04 1970-01-05 1970-01-05
您也可以使用以下解决方案。因为我们要创建重复的行,所以我们可以将操作包装在一个列表中,然后使用 unnest_longer
。 purrr
包函数一直是我的首选,但你也可以使用它作为替代。
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
group_by(value) %>%
mutate(date = list(start + 0:(duration/ddays(1)))) %>%
unnest_longer(date) %>%
select(-duration)
# A tibble: 12 x 4
# Groups: value [3]
value start end date
<chr> <date> <date> <date>
1 a 1970-01-02 1970-01-07 1970-01-02
2 a 1970-01-02 1970-01-07 1970-01-03
3 a 1970-01-02 1970-01-07 1970-01-04
4 a 1970-01-02 1970-01-07 1970-01-05
5 a 1970-01-02 1970-01-07 1970-01-06
6 a 1970-01-02 1970-01-07 1970-01-07
7 b 1970-01-03 1970-01-06 1970-01-03
8 b 1970-01-03 1970-01-06 1970-01-04
9 b 1970-01-03 1970-01-06 1970-01-05
10 b 1970-01-03 1970-01-06 1970-01-06
11 c 1970-01-04 1970-01-05 1970-01-04
12 c 1970-01-04 1970-01-05 1970-01-05
您不能拆开一列间隔并期望它生成其间的所有日期,但通过使用 seq
您可以自己生成它们。试试这个:
library(tidyverse)
library(lubridate)
df %>%
rowwise() %>%
summarise(
value, dates = seq(start, end, by = 1)
)
#> # A tibble: 12 x 2
#> value dates
#> <chr> <date>
#> 1 a 1970-01-02
#> 2 a 1970-01-03
#> 3 a 1970-01-04
#> 4 a 1970-01-05
#> 5 a 1970-01-06
#> 6 a 1970-01-07
#> 7 b 1970-01-03
#> 8 b 1970-01-04
#> 9 b 1970-01-05
#> 10 b 1970-01-06
#> 11 c 1970-01-04
#> 12 c 1970-01-05
由 reprex package (v1.0.0)
于 2021-05-18 创建一种data.table
方法
library(data.table)
setDT(df)[, .(date = seq(start, end, by = 1)), by = .(value)]
# value date
# 1: a 1970-01-02
# 2: a 1970-01-03
# 3: a 1970-01-04
# 4: a 1970-01-05
# 5: a 1970-01-06
# 6: a 1970-01-07
# 7: b 1970-01-03
# 8: b 1970-01-04
# 9: b 1970-01-05
#10: b 1970-01-06
#11: c 1970-01-04
#12: c 1970-01-05
和uncount
df %>% uncount(as.integer(duration/(24*60*60)) +1) %>%
group_by(value) %>%
mutate(date = row_number() -1 + start)
# A tibble: 12 x 5
# Groups: value [3]
value start end duration date
<chr> <date> <date> <Interval> <date>
1 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-02
2 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-03
3 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-04
4 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-05
5 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-06
6 a 1970-01-02 1970-01-07 1970-01-02 UTC--1970-01-07 UTC 1970-01-07
7 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-03
8 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-04
9 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-05
10 b 1970-01-03 1970-01-06 1970-01-03 UTC--1970-01-06 UTC 1970-01-06
11 c 1970-01-04 1970-01-05 1970-01-04 UTC--1970-01-05 UTC 1970-01-04
12 c 1970-01-04 1970-01-05 1970-01-04 UTC--1970-01-05 UTC 1970-01-05