如何在 R 中按组减去三个观察值的中位数

How to subtract a median of three observations by group in R

说,我有数据集。

structure(list(SKU = c(13284L, 13284L, 13284L, 13284L, 13284L, 
13284L, 13284L, 13284L, 13284L, 13284L, 13284L), stuff = c(4565, 
0, 0, 0, 567.0065222, 0, -1, 73.82897425, -1, 567.0065222, 614.2570658
), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), acnumber = c(329L, 
329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L), 
    year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", "stuff", 
"action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-11L))

action 列只有两个值 0 和 1。 正如我们所看到的,1 类事物有 1 个观测值,0 类事物有 10 个观测值。

1.I 必须计算最后三个观察值的中值,但不需要 Stuff 列中所有小于或等于零的值。所以我必须和最后三个观察员一起工作。按 0 个操作的类别分类的内容列。

567,0065222
73,8289742
567,0065222

the median =567,0065
  1. 现在,我必须取 1 个动作类别的单个值并从中减去计算出的中位数

    614,2570658-567,0065222=47,2505436

我这样做

AwesomeData %>% {.[.$stuff>0,]} %>% {.[.$action==0,]} %>% tail(3) %>% {median(.$stuff)} -> OURMEDIANA
AwesomeData %>% {.[.$action==1,]} %>% {.$stuff}-OURMEDIANA -> WHATWENEED
a=cbind(AwesomeData,WHATWENEED)

但是如果我有两个组怎么办 有些像那样

structure(list(SKU = c(13284L, 13284L, 13284L, 13284L, 13284L, 
13284L, 13284L, 13284L, 13284L, 13284L, 13284L, 13285L, 13285L, 
13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 
13285L), stuff = c(4565, 0, 0, 0, 567.00652, 0, -1, 73.82897, 
-1, 567.00652, 614.25707, 4565, 0, 0, 0, 567.00652, 0, -1, 73.82897, 
-1, 567.00652, 614.25707), action = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L
), acnumber = c(329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 
329L, 329L, 329L, 330L, 330L, 330L, 330L, 330L, 330L, 330L, 330L, 
330L, 330L, 330L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L)), .Names = c("SKU", 
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-22L))

现在我们有新组了

SKU +acnumber+year
13285+330+2017

如何将此字符串用于数据集中的所有组

作为我看到的输出

     SKU    stuff action acnumber  year   new
   <int>    <dbl>  <int>    <int> <int> <dbl>
 1 13284  4565         0      329  2018  3998
 2 13284     0         0      329  2018 - 567
 3 13284     0         0      329  2018 - 567
 4 13284     0         0      329  2018 - 567
 5 13284   567         0      329  2018     0
 6 13284     0         0      329  2018 - 567
 7 13284 -   1.00      0      329  2018 - 568
 8 13284    73.8       0      329  2018 - 493
 9 13284 -   1.00      0      329  2018 - 568
10 13284   567         0      329  2018     0

但我一定要看

SKU acnumber    year    result  
13284   329 2018    47,25055    
13285   330 2017    47,25055    614,25707-median of three last obs. Of zero (567,00652)

我们可以通过 'SKU'、'acnumber'、'year' 对 stuff 的最后 3 个观测值进行子集分组,其中 'action' 为 0 并且 'stuff' 为正,取 median 并从最后一个 'stuff' 观察值中减去,其中 'action' 为 1

library(dplyr)
df2 %>% 
  group_by(SKU, acnumber, year) %>% 
  summarise(new = tail(stuff[action ==1], 1) -  
                   median(tail(stuff[action == 0 & stuff > 0], 3)))
# A tibble: 2 x 4
# Groups:   SKU, acnumber [?]
#    SKU acnumber  year   new
#  <int>    <int> <int> <dbl>
#1 13284      329  2018  47.3
#2 13285      330  2017  47.3