在 R 中按组乘以中值
multiply values on median by group in R
我有数据集
df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405,
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767,
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306,
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458,
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L),
acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-21L))
action 列只有两个值 0 和 1。
正如我们所见,1 类事物有 3 个观测值,0 类事物有 18 个观测值。
我需要
-
仅针对类别 1(等于 25.98779894)计算不带零的填充变量的中值。
正如我们所看到的,1 和 1 之间有零,它们需要被删除,负值(如果存在)也是如此。
即,好像数据集是这样的:
structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405,
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767,
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306,
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458,
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L),
acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-21L))
我还需要根据类别 0 的填充变量计算最后三个观察值的中值,它在第一个观察值之前,
在我们的例子中是 12,40326767
然后从类别 1 的中位数中减去类别 0 的中位数,然后乘以一的数量,在本例中为 3。
(25,98779894-12,40326767)*3=40,75359381
这个解决方案
df %>%
group_by(SKU,acnumber,year) %>%
summarize(value = 3*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
stuff=first(stuff),
action = sum(action)) %>%
select(SKU,stuff,action,acnumber,year,value)
Moody_Mudskipper 帮助了我
但是!在这个例子中,action的个数是3,所以我们乘以3,
但个数可以大于 3 也可以小于 3。
如何乘以实数?
例如,如果我们有 2 个 by action for stuff,那么
summarize(value = 2*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
免得每次都手动输入。
解决方法
sum(df$action == 1)
不适合
summarize(value = sum(df$action == 1)*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
因为它把dataset的所有的加起来,然后乘法不正确。
总个数=692,这个数字乘以
summarize(value = 692*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
错了
1 的乘法必须针对每个特定组 SKU、acnumber、year
111-23-2018 is first group has 3 ones
112-24-2018 is second group has 2 ones
等等
如何做正确?
df%>%
group_by(SKU,acnumber,year)%>%
summarise(s=sum(action),k=which(action==1)[1],
l=s*(median(stuff[action==1])-median(stuff[(k-s+1):k])))%>%
data.frame()
SKU acnumber year s k l
1 11202 137 2018 3 11 40.75359
我有数据集
df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405,
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767,
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306,
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458,
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L),
acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-21L))
action 列只有两个值 0 和 1。 正如我们所见,1 类事物有 3 个观测值,0 类事物有 18 个观测值。
我需要
-
仅针对类别 1(等于 25.98779894)计算不带零的填充变量的中值。
正如我们所看到的,1 和 1 之间有零,它们需要被删除,负值(如果存在)也是如此。
即,好像数据集是这样的:
structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405,
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767,
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306,
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458,
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 1L),
acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-21L))
我还需要根据类别 0 的填充变量计算最后三个观察值的中值,它在第一个观察值之前, 在我们的例子中是 12,40326767
然后从类别 1 的中位数中减去类别 0 的中位数,然后乘以一的数量,在本例中为 3。
(25,98779894-12,40326767)*3=40,75359381
这个解决方案
df %>%
group_by(SKU,acnumber,year) %>%
summarize(value = 3*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
stuff=first(stuff),
action = sum(action)) %>%
select(SKU,stuff,action,acnumber,year,value)
Moody_Mudskipper 帮助了我
但是!在这个例子中,action的个数是3,所以我们乘以3, 但个数可以大于 3 也可以小于 3。 如何乘以实数? 例如,如果我们有 2 个 by action for stuff,那么
summarize(value = 2*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
免得每次都手动输入。
解决方法
sum(df$action == 1)
不适合
summarize(value = sum(df$action == 1)*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
因为它把dataset的所有的加起来,然后乘法不正确。 总个数=692,这个数字乘以
summarize(value = 692*(median(stuff[action==1]) - median(stuff[match(1,action)-3:1])),
错了 1 的乘法必须针对每个特定组 SKU、acnumber、year
111-23-2018 is first group has 3 ones
112-24-2018 is second group has 2 ones
等等
如何做正确?
df%>%
group_by(SKU,acnumber,year)%>%
summarise(s=sum(action),k=which(action==1)[1],
l=s*(median(stuff[action==1])-median(stuff[(k-s+1):k])))%>%
data.frame()
SKU acnumber year s k l
1 11202 137 2018 3 11 40.75359