如何在 R 中按组减去三个观察值的中位数
How to subtract a median of three observations by group in R
说,我有数据集。
structure(list(SKU = c(13284L, 13284L, 13284L, 13284L, 13284L,
13284L, 13284L, 13284L, 13284L, 13284L, 13284L), stuff = c(4565,
0, 0, 0, 567.0065222, 0, -1, 73.82897425, -1, 567.0065222, 614.2570658
), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), acnumber = c(329L,
329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L),
year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", "stuff",
"action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-11L))
action 列只有两个值 0 和 1。
正如我们所看到的,1 类事物有 1 个观测值,0 类事物有 10 个观测值。
1.I 必须计算最后三个观察值的中值,但不需要 Stuff 列中所有小于或等于零的值。所以我必须和最后三个观察员一起工作。按 0 个操作的类别分类的内容列。
567,0065222
73,8289742
567,0065222
the median =567,0065
现在,我必须取 1 个动作类别的单个值并从中减去计算出的中位数
614,2570658-567,0065222=47,2505436
我这样做
AwesomeData %>% {.[.$stuff>0,]} %>% {.[.$action==0,]} %>% tail(3) %>% {median(.$stuff)} -> OURMEDIANA
AwesomeData %>% {.[.$action==1,]} %>% {.$stuff}-OURMEDIANA -> WHATWENEED
a=cbind(AwesomeData,WHATWENEED)
但是如果我有两个组怎么办
有些像那样
structure(list(SKU = c(13284L, 13284L, 13284L, 13284L, 13284L,
13284L, 13284L, 13284L, 13284L, 13284L, 13284L, 13285L, 13285L,
13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 13285L,
13285L), stuff = c(4565, 0, 0, 0, 567.00652, 0, -1, 73.82897,
-1, 567.00652, 614.25707, 4565, 0, 0, 0, 567.00652, 0, -1, 73.82897,
-1, 567.00652, 614.25707), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L
), acnumber = c(329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L,
329L, 329L, 329L, 330L, 330L, 330L, 330L, 330L, 330L, 330L, 330L,
330L, 330L, 330L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-22L))
现在我们有新组了
SKU +acnumber+year
13285+330+2017
如何将此字符串用于数据集中的所有组
作为我看到的输出
SKU stuff action acnumber year new
<int> <dbl> <int> <int> <int> <dbl>
1 13284 4565 0 329 2018 3998
2 13284 0 0 329 2018 - 567
3 13284 0 0 329 2018 - 567
4 13284 0 0 329 2018 - 567
5 13284 567 0 329 2018 0
6 13284 0 0 329 2018 - 567
7 13284 - 1.00 0 329 2018 - 568
8 13284 73.8 0 329 2018 - 493
9 13284 - 1.00 0 329 2018 - 568
10 13284 567 0 329 2018 0
但我一定要看
SKU acnumber year result
13284 329 2018 47,25055
13285 330 2017 47,25055 614,25707-median of three last obs. Of zero (567,00652)
我们可以通过 'SKU'、'acnumber'、'year' 对 stuff
的最后 3 个观测值进行子集分组,其中 'action' 为 0 并且 'stuff' 为正,取 median
并从最后一个 'stuff' 观察值中减去,其中 'action' 为 1
library(dplyr)
df2 %>%
group_by(SKU, acnumber, year) %>%
summarise(new = tail(stuff[action ==1], 1) -
median(tail(stuff[action == 0 & stuff > 0], 3)))
# A tibble: 2 x 4
# Groups: SKU, acnumber [?]
# SKU acnumber year new
# <int> <int> <int> <dbl>
#1 13284 329 2018 47.3
#2 13285 330 2017 47.3
说,我有数据集。
structure(list(SKU = c(13284L, 13284L, 13284L, 13284L, 13284L,
13284L, 13284L, 13284L, 13284L, 13284L, 13284L), stuff = c(4565,
0, 0, 0, 567.0065222, 0, -1, 73.82897425, -1, 567.0065222, 614.2570658
), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L), acnumber = c(329L,
329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L),
year = c(2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", "stuff",
"action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-11L))
action 列只有两个值 0 和 1。 正如我们所看到的,1 类事物有 1 个观测值,0 类事物有 10 个观测值。
1.I 必须计算最后三个观察值的中值,但不需要 Stuff 列中所有小于或等于零的值。所以我必须和最后三个观察员一起工作。按 0 个操作的类别分类的内容列。
567,0065222
73,8289742
567,0065222
the median =567,0065
现在,我必须取 1 个动作类别的单个值并从中减去计算出的中位数
614,2570658-567,0065222=47,2505436
我这样做
AwesomeData %>% {.[.$stuff>0,]} %>% {.[.$action==0,]} %>% tail(3) %>% {median(.$stuff)} -> OURMEDIANA
AwesomeData %>% {.[.$action==1,]} %>% {.$stuff}-OURMEDIANA -> WHATWENEED
a=cbind(AwesomeData,WHATWENEED)
但是如果我有两个组怎么办 有些像那样
structure(list(SKU = c(13284L, 13284L, 13284L, 13284L, 13284L,
13284L, 13284L, 13284L, 13284L, 13284L, 13284L, 13285L, 13285L,
13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 13285L, 13285L,
13285L), stuff = c(4565, 0, 0, 0, 567.00652, 0, -1, 73.82897,
-1, 567.00652, 614.25707, 4565, 0, 0, 0, 567.00652, 0, -1, 73.82897,
-1, 567.00652, 614.25707), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L
), acnumber = c(329L, 329L, 329L, 329L, 329L, 329L, 329L, 329L,
329L, 329L, 329L, 330L, 330L, 330L, 330L, 330L, 330L, 330L, 330L,
330L, 330L, 330L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-22L))
现在我们有新组了
SKU +acnumber+year
13285+330+2017
如何将此字符串用于数据集中的所有组
作为我看到的输出
SKU stuff action acnumber year new
<int> <dbl> <int> <int> <int> <dbl>
1 13284 4565 0 329 2018 3998
2 13284 0 0 329 2018 - 567
3 13284 0 0 329 2018 - 567
4 13284 0 0 329 2018 - 567
5 13284 567 0 329 2018 0
6 13284 0 0 329 2018 - 567
7 13284 - 1.00 0 329 2018 - 568
8 13284 73.8 0 329 2018 - 493
9 13284 - 1.00 0 329 2018 - 568
10 13284 567 0 329 2018 0
但我一定要看
SKU acnumber year result
13284 329 2018 47,25055
13285 330 2017 47,25055 614,25707-median of three last obs. Of zero (567,00652)
我们可以通过 'SKU'、'acnumber'、'year' 对 stuff
的最后 3 个观测值进行子集分组,其中 'action' 为 0 并且 'stuff' 为正,取 median
并从最后一个 'stuff' 观察值中减去,其中 'action' 为 1
library(dplyr)
df2 %>%
group_by(SKU, acnumber, year) %>%
summarise(new = tail(stuff[action ==1], 1) -
median(tail(stuff[action == 0 & stuff > 0], 3)))
# A tibble: 2 x 4
# Groups: SKU, acnumber [?]
# SKU acnumber year new
# <int> <int> <int> <dbl>
#1 13284 329 2018 47.3
#2 13285 330 2017 47.3