如何计算具体百分比
How to calculate specific percentages
我有一个数据 table,其中的数据在 4 个不同级别(级别 0、1、2 和 3)给出。我想计算第 3 级的供应如何在每个州的第 2 级中分配。 (我在table中保留了occ_code
,这样即使state_code
和level
相同,也是唯一的记录)
正在创建示例 Table:
library(data.table)
state_code = c(rep(1,14))
level = c(0,1,2,3,3,2,3,1,2,3,3,3,2,3)
occ_code = LETTERS[1:14]
supply = c(100,60,50,25,25,10,10,40,30,10,10,10,10,10)
DT = data.table(state_code,occ_code,level,supply)
期望的输出
perc = c(NA,NA,NA,0.5,0.5,NA,1,NA,NA,0.33,0.33,0.33,NA,1)
DT2 = data.table(DT,perc)
基本上,我想使用这些百分比来预测仅在第 2 级给出的另一个数据。
重构数据,只存储level 3的信息,其他信息可以这样计算:
library(data.table)
dt3 <- DT[level == 3, ]
dt3[, parent := c("2C", "2C", "2F", "2I", "2I", "2I", "2M")]
dt3[, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 D 3 25 2C 0.5000
2: 1 E 3 25 2C 0.5000
3: 1 G 3 10 2F 1.0000
4: 2 J 3 10 2I 0.3333
5: 2 K 3 10 2I 0.3333
6: 2 L 3 10 2I 0.3333
7: 2 N 3 10 2M 1.0000
分别计算 level
0、1 和 2 的 supply
:
dt3[, sum(supply)]
dt3[, sum(supply), by = state_code]
dt3[, sum(supply), by = parent]
第二种方法:
DT[level == 2, parent := paste0(level, occ_code)]
DT[level > 1, parent := parent[1], by = .(cumsum(!is.na(parent)))]
DT[level == 3, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 A 0 100 NA NA
2: 1 B 1 60 NA NA
3: 1 C 2 50 2C NA
4: 1 D 3 25 2C 0.5000
5: 1 E 3 25 2C 0.5000
6: 1 F 2 10 2F NA
7: 1 G 3 10 2F 1.0000
8: 2 H 1 40 NA NA
9: 2 I 2 30 2I NA
10: 2 J 3 10 2I 0.3333
11: 2 K 3 10 2I 0.3333
12: 2 L 3 10 2I 0.3333
13: 2 M 2 10 2M NA
14: 2 N 3 10 2M 1.0000
可能的解决方案:
DT[, rl := rleid(level), by = state_code
][level == 3, perc := supply/sum(supply), by = .(state_code, rl)
][, rl := NULL][]
给出:
> DT
state_code occ_code level supply perc
1: 1 A 0 100 NA
2: 1 B 1 60 NA
3: 1 C 2 50 NA
4: 1 D 3 25 0.5000000
5: 1 E 3 25 0.5000000
6: 1 F 2 10 NA
7: 1 G 3 10 1.0000000
8: 2 H 1 40 NA
9: 2 I 2 30 NA
10: 2 J 3 10 0.3333333
11: 2 K 3 10 0.3333333
12: 2 L 3 10 0.3333333
13: 2 M 2 10 NA
14: 2 N 3 10 1.0000000
我有一个数据 table,其中的数据在 4 个不同级别(级别 0、1、2 和 3)给出。我想计算第 3 级的供应如何在每个州的第 2 级中分配。 (我在table中保留了occ_code
,这样即使state_code
和level
相同,也是唯一的记录)
正在创建示例 Table:
library(data.table)
state_code = c(rep(1,14))
level = c(0,1,2,3,3,2,3,1,2,3,3,3,2,3)
occ_code = LETTERS[1:14]
supply = c(100,60,50,25,25,10,10,40,30,10,10,10,10,10)
DT = data.table(state_code,occ_code,level,supply)
期望的输出
perc = c(NA,NA,NA,0.5,0.5,NA,1,NA,NA,0.33,0.33,0.33,NA,1)
DT2 = data.table(DT,perc)
基本上,我想使用这些百分比来预测仅在第 2 级给出的另一个数据。
重构数据,只存储level 3的信息,其他信息可以这样计算:
library(data.table)
dt3 <- DT[level == 3, ]
dt3[, parent := c("2C", "2C", "2F", "2I", "2I", "2I", "2M")]
dt3[, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 D 3 25 2C 0.5000
2: 1 E 3 25 2C 0.5000
3: 1 G 3 10 2F 1.0000
4: 2 J 3 10 2I 0.3333
5: 2 K 3 10 2I 0.3333
6: 2 L 3 10 2I 0.3333
7: 2 N 3 10 2M 1.0000
分别计算 level
0、1 和 2 的 supply
:
dt3[, sum(supply)]
dt3[, sum(supply), by = state_code]
dt3[, sum(supply), by = parent]
第二种方法:
DT[level == 2, parent := paste0(level, occ_code)]
DT[level > 1, parent := parent[1], by = .(cumsum(!is.na(parent)))]
DT[level == 3, perc := round(supply / sum(supply), 4), by = parent]
state_code occ_code level supply parent perc
1: 1 A 0 100 NA NA
2: 1 B 1 60 NA NA
3: 1 C 2 50 2C NA
4: 1 D 3 25 2C 0.5000
5: 1 E 3 25 2C 0.5000
6: 1 F 2 10 2F NA
7: 1 G 3 10 2F 1.0000
8: 2 H 1 40 NA NA
9: 2 I 2 30 2I NA
10: 2 J 3 10 2I 0.3333
11: 2 K 3 10 2I 0.3333
12: 2 L 3 10 2I 0.3333
13: 2 M 2 10 2M NA
14: 2 N 3 10 2M 1.0000
可能的解决方案:
DT[, rl := rleid(level), by = state_code
][level == 3, perc := supply/sum(supply), by = .(state_code, rl)
][, rl := NULL][]
给出:
> DT state_code occ_code level supply perc 1: 1 A 0 100 NA 2: 1 B 1 60 NA 3: 1 C 2 50 NA 4: 1 D 3 25 0.5000000 5: 1 E 3 25 0.5000000 6: 1 F 2 10 NA 7: 1 G 3 10 1.0000000 8: 2 H 1 40 NA 9: 2 I 2 30 NA 10: 2 J 3 10 0.3333333 11: 2 K 3 10 0.3333333 12: 2 L 3 10 0.3333333 13: 2 M 2 10 NA 14: 2 N 3 10 1.0000000