如何使用 R 及时创建快照

How to create snapshots in time with R

library(tidyverse)

df <- tibble(`Action Item ID` = c("ABC", "EFG", "HIJ", "KLM", "NOP", "QRS"),
             `Date Created` = as.Date(c("2019-01-01", "2019-01-01", 
                                        "2019-06-01", "2019-06-01",
                                        "2019-08-01", "2019-08-01")),
             `Date Closed` = as.Date(c("2019-01-15", "2019-05-31", 
                                        "2019-06-15", "2019-07-05",
                                        "2019-08-15", NA)),
             `Current Status` = c(rep("Closed", 5), "Open"))

#> # A tibble: 6 x 4
#>   `Action Item ID` `Date Created` `Date Closed` `Current Status`
#>   <chr>            <date>         <date>        <chr>           
#> 1 ABC              2019-01-01     2019-01-15    Closed          
#> 2 EFG              2019-01-01     2019-05-20    Closed          
#> 3 HIJ              2019-06-01     2019-06-15    Closed          
#> 4 KLM              2019-06-01     2019-07-05    Closed          
#> 5 NOP              2019-08-01     2019-08-15    Closed          
#> 6 QRS              2019-08-01     NA            Open  

我正在尝试使用上面显示的数据框 (tibble) 逐月构建开放行动项目的线图。每个月都将是该月最后一天的时间快照(不再存在)。让我们看两个操作项来说明我的问题。

1 月的最后一天午夜(我的第一张快照):

说起来容易做起来难。也许我只是没有经验。什么是 "Magic Code" 我可以在下面写:

  1. 从我的数据框中删除所有月份,以如下所示的方式计数。
  2. 使用适当的值(甚至 0 需要时)填充缺失的月份,即使它们不在我的数据框中,即 tidyr::complete
  3. 每月连续计算未完成的操作项,直到它们关闭

这是 "Magic Code" 的结果,由我手动执行。请注意,一切都必须自动化,我不能手动更改每月的月份名称。谢谢。

df.months <- "Magic Code"

#> # A tibble: 6 x 4
#> `Month`       `Action Item Qty Open at End of Month` 
#> <date>         <integer>    
#> 2019-01-01     1
#> 2019-02-01     1
#> 2019-03-01     1
#> 2019-04-01     1
#> 2019-05-01     0
#> 2019-06-01     1
#> 2019-07-01     0
#> 2019-08-01     1

这是一种方法。先reshape为更长的形式,然后将Created算作加一,Closed算作减一。然后计算这些增量的每月总计,并填写缺失的月份。

df %>%
  # convert to longer form, with one row for each Created or Closed
  pivot_longer(-c(`Action Item ID`, `Current Status`), "type", "date") %>%
  mutate(change = if_else(type == "Date Created", 1, -1)) %>%
  mutate(month = lubridate::floor_date(value, "month")) %>%
  arrange(value) %>%
  # get the sum of "change" for each month. Equivalent to:
  #    group_by(month) %>% summarize(n = sum(change) %>%
  count(month, wt = change) %>%
  # Add rows for any missing months in the series and replace NAs with 0's
  padr::pad() %>%
  replace_na(list(n=0)) %>%
  # Track cumulative change across all months to date
  mutate("Open at end of month" = cumsum(n))


## A tibble: 9 x 3
#  month          n `Open at end of month`
#  <date>     <dbl>                  <dbl>
#1 2019-01-01     1                      1
#2 2019-02-01     0                      1
#3 2019-03-01     0                      1
#4 2019-04-01     0                      1
#5 2019-05-01    -1                      0
#6 2019-06-01     1                      1
#7 2019-07-01    -1                      0
#8 2019-08-01     1                      1
#9 NA            -1                      0