如何对缺少观察值的分组数据使用累积和函数

How to use cumulative sum function for grouped data with missing observations

我使用的数据框看起来像这样:

    DATUM               CP                SMER  TRH   MNOZSTVI  CENA POPLATKY   OBJEM UCET  KVARTAL   ROK AKTUALNI.MNOZSTVI
   <dttm>              <chr>             <chr> <chr>    <dbl> <dbl>    <dbl>   <dbl> <chr> <chr>   <dbl>             <dbl>
 1 2020-03-03 00:00:00 CEZ               K     BCPP        50 465.      91.3 -23240  CZK   Q1       2020                NA
 2 2020-03-04 00:00:00 CEZ               K     BCPP        50 467.      58.9 -13980  CZK   Q1       2020                NA
 3 2020-03-12 00:00:00 CEZ               P     BCPP        30 398       51.8  11940  CZK   Q1       2020                NA
 4 2020-03-25 00:00:00 KOMERCNI BANKA    K     BCPP        40 542       85.9 -21680  CZK   Q1       2020                NA
 5 2020-03-25 00:00:00 MONETA MONEY BANK K     BCPP       300  58.4     71.3 -17505  CZK   Q1       2020                NA
 6 2020-03-30 00:00:00 CEZ               K     BCPP        10 391       50    -3910  CZK   Q1       2020                NA
 7 2020-04-02 00:00:00 USD               K     NA        1000  25.8      0   -25778  CZK   Q2       2020                NA
 8 2020-04-03 00:00:00 USD               K     NA        3000  26.1      0   -78392  CZK   Q2       2020                NA
 9 2020-04-04 00:00:00 USD               K     NA        1000  26.4      0   -26363. CZK   Q2       2020                NA
10 2020-04-06 00:00:00 AVAST             K     BCPP       150 125.      75.8 -18810  CZK   Q2       2020                NA

我想把变量MNOZSTVI的累计和填入按CP分组的变量AKTUALNI.MNOZSTVI。所以向量 AKTUALNI.MNOZSTVI 应该是 c(50,100,130,40,300,140,​​1000,4000,5000,150, etc.).

问题是 MNOZSTVI 的某些值缺失,所以我不知道如何使用无法处理缺失值的函数 cumsun() + 我很难为分组数据执行它。

有没有人知道如何借助 cumsum() 或其他函数来做到这一点? 谢谢。

我们可以按'CP'分组,得到mutate

中'MNOZSTVI'的cumsum
library(dplyr)
df1 <- df1 %>%
     group_by(CP) %>%
     mutate(AKTUALNI.MNOZSTVI  = cumsum(MNOZSTVI))

或使用 base Rave

df1$AKTUALNI.MNOZSTVI <- with(df1, ave(MNOZSTVI, CP, FUN = cumsum))
library(dplyr)
df %>%
  group_by(CP) %>%
  mutate(AKTUALNI.MNOZSTVI  = cumsum(MNOZSTVI))

输出:

   DATUM      CP                         SMER  TRH   MNOZSTVI  CENA POPLATKY OBJEM UCET  KVARTAL ROK.AKTUALNI..MNOZSTVI AKTUALNI.MNOZSTVI
   <chr>      <chr>                      <chr> <chr>    <int> <dbl> <chr>    <chr> <chr> <chr>   <chr>                              <int>
 1 2020-03-03 00:00:00 CEZ               K     BCPP        50 465   91.3     NA    CZK   Q1      2020 NA                               50
 2 2020-03-04 00:00:00 CEZ               K     BCPP        50 467   58.9     NA    CZK   Q1      2020 NA                              100
 3 2020-03-12 00:00:00 CEZ               P     BCPP        30 398   51.8     11940 CZK   Q1      2020                                 130
 4 2020-03-25 00:00:00 KOMERCNI BANKA    K     BCPP        40 542   85.9     -     CZK   Q1      2020                                  40
 5 2020-03-25 00:00:00 MONETA MONEY BANK K     BCPP       300  58.4 71.3     -     CZK   Q1      2020                                 300
 6 2020-03-30 00:00:00 CEZ               K     BCPP        10 391   50       -     CZK   Q1      2020                                 140
 7 2020-04-02 00:00:00 USD               K     NA        1000  25.8 0        -     CZK   Q2      2020                                1000
 8 2020-04-03 00:00:00 USD               K     NA        3000  26.1 0        -     CZK   Q2      2020                                4000
 9 2020-04-04 00:00:00 USD               K     NA        1000  26.4 0        -     CZK   Q2      2020                                5000
10 2020-04-06 00:00:00 AVAST             K     BCPP       150 125   75. 8    -     CZK   Q2      2020                                 150

数据:

df <- tibble::tribble(
        ~DATUM,                          ~CP, ~SMER,   ~TRH, ~MNOZSTVI, ~CENA, ~POPLATKY,  ~OBJEM, ~UCET, ~KVARTAL, ~ROK.AKTUALNI..MNOZSTVI,
  "2020-03-03",               "00:00:00 CEZ",   "K", "BCPP",       50L,   465,    "91.3",      NA, "CZK",     "Q1",               "2020 NA",
  "2020-03-04",               "00:00:00 CEZ",   "K", "BCPP",       50L,   467,    "58.9",      NA, "CZK",     "Q1",               "2020 NA",
  "2020-03-12",               "00:00:00 CEZ",   "P", "BCPP",       30L,   398,    "51.8", "11940", "CZK",     "Q1",                  "2020",
  "2020-03-25",    "00:00:00 KOMERCNI BANKA",   "K", "BCPP",       40L,   542,    "85.9",     "-", "CZK",     "Q1",                  "2020",
  "2020-03-25", "00:00:00 MONETA MONEY BANK",   "K", "BCPP",      300L,  58.4,    "71.3",     "-", "CZK",     "Q1",                  "2020",
  "2020-03-30",               "00:00:00 CEZ",   "K", "BCPP",       10L,   391,      "50",     "-", "CZK",     "Q1",                  "2020",
  "2020-04-02",               "00:00:00 USD",   "K",     NA,     1000L,  25.8,       "0",     "-", "CZK",     "Q2",                  "2020",
  "2020-04-03",               "00:00:00 USD",   "K",     NA,     3000L,  26.1,       "0",     "-", "CZK",     "Q2",                  "2020",
  "2020-04-04",               "00:00:00 USD",   "K",     NA,     1000L,  26.4,       "0",     "-", "CZK",     "Q2",                  "2020",
  "2020-04-06",             "00:00:00 AVAST",   "K", "BCPP",      150L,   125,   "75. 8",     "-", "CZK",     "Q2",                  "2020"
  )