获取具有可变每日读数的时间序列数据中的所有可能组合
Get all possible combinations in a time-series data with variable daily readings
我有一个日常消费的时间序列数据集,如下所示:
consumption <- data.frame(
date = as.Date(c('2020-06-01','2020-06-02','2020-06-03','2020-06-03',
'2020-06-03','2020-06-04','2020-06-05','2020-06-05')),
val = c(10,20,31,32,33,40,51,52)
)
consumption <- consumption %>%
group_by(date) %>%
mutate(n = n(), record = row_number()) %>%
ungroup()
consumption
# A tibble: 8 × 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-03 32 3 2
5 2020-06-03 33 3 3
6 2020-06-04 40 1 1
7 2020-06-05 51 2 1
8 2020-06-05 52 2 2
有些日子在数据集中有不止一行。我想将其转换为具有所有可能组合的拆分组,例如:
第 1 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 31 1
4 2020-06-04 40 1
5 2020-06-05 51 1
第 2 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 31 1
4 2020-06-04 40 1
5 2020-06-05 52 2
第 3 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 32 2
4 2020-06-04 40 1
5 2020-06-05 51 1
第 4 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 32 2
4 2020-06-04 40 1
5 2020-06-05 52 2
第 5 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 33 3
4 2020-06-04 40 1
5 2020-06-05 51 1
第 6 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 33 3
4 2020-06-04 40 1
5 2020-06-05 52 2
我尝试了以下解决方案,但没有产生预期的结果。
library(dplyr)
library(purrr)
out <- consumption %>%
filter(n > 1) %>%
group_split(date, rn = row_number()) %>%
map(~ bind_rows(consumption %>%
filter(n == 1), .x %>%
select(-rn)) %>%
arrange(date))
非常感谢您帮助解决这个问题。
非常感谢,
我们可以 filter
其中 'record' 大于 1,group_split
通过 'row_number' 和 'date',然后用 [= 绑定行14=]ed 数据,其中 'record' 为 1
library(dplyr)
library(purrr)
out <- consumption %>%
filter(n > 1) %>%
group_split(date, rn = row_number()) %>%
map(~ bind_rows(consumption %>%
filter(n == 1), .x %>%
select(-rn)) %>%
arrange(date))
-输出
> out
[[1]]
# A tibble: 4 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
[[2]]
# A tibble: 4 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
[[3]]
# A tibble: 4 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
使用更新后的数据,我们创建 row_number()
,然后通过 'date' 列创建 split
(如@ThomasIsCoding 解决方案),使用 crossing
(来自 purrr
)展开数据,根据行索引
循环遍历pmap
、slice
原始数据的行
library(tidyr)
library(tibble)
consumption %>%
transmute(date, rn = row_number()) %>%
deframe %>%
split(names(.)) %>%
invoke(crossing, .) %>%
pmap(~ consumption %>%
slice(c(...))) %>%
unname
-输出
[[1]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[2]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[3]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[4]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[5]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[6]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
也许你可以试试下面的代码
with(
consumption,
apply(
expand.grid(
split(seq_along(date), date)
),
1,
function(k) consumption[k, ]
)
)
这给出了
[[1]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[2]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[3]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[4]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[5]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[6]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
这是使用一些基本 dplyr
和 tidyr
函数的方法。
首先,完成每个日期/副本组合的数据。然后用先验值填充缺失的,reshape wide。
library(tidyverse)
consumption %>%
complete(date, record) %>%
group_by(date) %>% fill(val) %>% ungroup() %>%
pivot_wider(-n, names_from = record, values_from = val)
# A tibble: 5 x 4
date `1` `2` `3`
<date> <dbl> <dbl> <dbl>
1 2020-06-01 10 10 10
2 2020-06-02 20 20 20
3 2020-06-03 31 32 33
4 2020-06-04 40 40 40
5 2020-06-05 51 52 52
我有一个日常消费的时间序列数据集,如下所示:
consumption <- data.frame(
date = as.Date(c('2020-06-01','2020-06-02','2020-06-03','2020-06-03',
'2020-06-03','2020-06-04','2020-06-05','2020-06-05')),
val = c(10,20,31,32,33,40,51,52)
)
consumption <- consumption %>%
group_by(date) %>%
mutate(n = n(), record = row_number()) %>%
ungroup()
consumption
# A tibble: 8 × 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-03 32 3 2
5 2020-06-03 33 3 3
6 2020-06-04 40 1 1
7 2020-06-05 51 2 1
8 2020-06-05 52 2 2
有些日子在数据集中有不止一行。我想将其转换为具有所有可能组合的拆分组,例如:
第 1 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 31 1
4 2020-06-04 40 1
5 2020-06-05 51 1
第 2 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 31 1
4 2020-06-04 40 1
5 2020-06-05 52 2
第 3 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 32 2
4 2020-06-04 40 1
5 2020-06-05 51 1
第 4 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 32 2
4 2020-06-04 40 1
5 2020-06-05 52 2
第 5 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 33 3
4 2020-06-04 40 1
5 2020-06-05 51 1
第 6 组:
date val record
1 2020-06-01 10 1
2 2020-06-02 20 1
3 2020-06-03 33 3
4 2020-06-04 40 1
5 2020-06-05 52 2
我尝试了以下解决方案,但没有产生预期的结果。
library(dplyr)
library(purrr)
out <- consumption %>%
filter(n > 1) %>%
group_split(date, rn = row_number()) %>%
map(~ bind_rows(consumption %>%
filter(n == 1), .x %>%
select(-rn)) %>%
arrange(date))
非常感谢您帮助解决这个问题。
非常感谢,
我们可以 filter
其中 'record' 大于 1,group_split
通过 'row_number' 和 'date',然后用 [= 绑定行14=]ed 数据,其中 'record' 为 1
library(dplyr)
library(purrr)
out <- consumption %>%
filter(n > 1) %>%
group_split(date, rn = row_number()) %>%
map(~ bind_rows(consumption %>%
filter(n == 1), .x %>%
select(-rn)) %>%
arrange(date))
-输出
> out
[[1]]
# A tibble: 4 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
[[2]]
# A tibble: 4 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
[[3]]
# A tibble: 4 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
使用更新后的数据,我们创建 row_number()
,然后通过 'date' 列创建 split
(如@ThomasIsCoding 解决方案),使用 crossing
(来自 purrr
)展开数据,根据行索引
pmap
、slice
原始数据的行
library(tidyr)
library(tibble)
consumption %>%
transmute(date, rn = row_number()) %>%
deframe %>%
split(names(.)) %>%
invoke(crossing, .) %>%
pmap(~ consumption %>%
slice(c(...))) %>%
unname
-输出
[[1]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[2]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[3]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[4]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[5]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[6]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
也许你可以试试下面的代码
with(
consumption,
apply(
expand.grid(
split(seq_along(date), date)
),
1,
function(k) consumption[k, ]
)
)
这给出了
[[1]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[2]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[3]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 51 2 1
[[4]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 31 3 1
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[5]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 32 3 2
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
[[6]]
# A tibble: 5 x 4
date val n record
<date> <dbl> <int> <int>
1 2020-06-01 10 1 1
2 2020-06-02 20 1 1
3 2020-06-03 33 3 3
4 2020-06-04 40 1 1
5 2020-06-05 52 2 2
这是使用一些基本 dplyr
和 tidyr
函数的方法。
首先,完成每个日期/副本组合的数据。然后用先验值填充缺失的,reshape wide。
library(tidyverse)
consumption %>%
complete(date, record) %>%
group_by(date) %>% fill(val) %>% ungroup() %>%
pivot_wider(-n, names_from = record, values_from = val)
# A tibble: 5 x 4
date `1` `2` `3`
<date> <dbl> <dbl> <dbl>
1 2020-06-01 10 10 10
2 2020-06-02 20 20 20
3 2020-06-03 31 32 33
4 2020-06-04 40 40 40
5 2020-06-05 51 52 52