基于两列查找百分比

Finding Percentage Based on Two Columns

这是原始数据框的样子:

         PLACEMENT      SIZE      COST
1        placement1     LARGE    1838128.00
58       placement1     MEDIUM   10962048.00
117      placement1     SMALL    2622851.00
175      placement1     UNKNOWN  443.00
2        placement2     LARGE    598.00
59       placement2     MEDIUM   24358.00
118      placement2     SMALL    571802.00
176      placement2     UNKNOWN  1706.00
3        placement3     LARGE    8.00
60       placement3     MEDIUM   22.00  
119      placement3     SMALL    502388.00
177      placement3     UNKNOWN  762.00

如何创建一个列来显示 SIZE by PLACEMENT 的百分比?

我希望它最后看起来像这样:

         PLACEMENT      SIZE      COST           PERCENTAGE
1        placement1     LARGE    1838128.00         11.9
58       placement1     MEDIUM   10962048.00        71.1
117      placement1     SMALL    2622851.00         17.0
175      placement1     UNKNOWN  443.00              0.0 
2        placement2     LARGE    598.00              0.1
59       placement2     MEDIUM   24358.00           4.07
118      placement2     SMALL    571802.00         95.54
176      placement2     UNKNOWN  1706.00            0.29
3        placement3     LARGE    8.00                0.0
60       placement3     MEDIUM   22.00               0.0
119      placement3     SMALL    502388.00         99.84
177      placement3     UNKNOWN  762.00             0.16 

任何帮助都会很棒,谢谢!我无法用 prop.table 库解决这个问题,尽管我觉得我应该使用它。

您可以使用 dplyr 快速完成:

library(dplyr)
df <- df %>% group_by(PLACEMENT) %>% mutate(PERCENTAGE=COST/SUM(COST))

看起来你想要的结果也是四舍五入的,如果你愿意,你可以用函数 round() 来做。

编辑 如果你想让你的百分比保持在 1 到 100 之间,你当然可以通过写 100*COST/SUM(COST) 来做到这一点,如果你更喜欢这样。

假设您的数据框输入是 DF 这将完成。不需要包。

transform(DF, PC = 100 * ave(COST, PLACEMENT, FUN = prop.table)) 

给予:

     PLACEMENT    SIZE     COST           PC
1   placement1   LARGE  1838128 11.917733169
58  placement1  MEDIUM 10962048 71.073811535
117 placement1   SMALL  2622851 17.005583050
175 placement1 UNKNOWN      443  0.002872246
2   placement2   LARGE      598  0.099922468
59  placement2  MEDIUM    24358  4.070086087
118 placement2   SMALL   571802 95.544928350
176 placement2 UNKNOWN     1706  0.285063095
3   placement3   LARGE        8  0.001589888
60  placement3  MEDIUM       22  0.004372193
119 placement3   SMALL   502388 99.842601057
177 placement3 UNKNOWN      762  0.151436862

注意:可重现形式的输入是:

Lines <- "PLACEMENT      SIZE      COST
1        placement1     LARGE    1838128.00
58       placement1     MEDIUM   10962048.00
117      placement1     SMALL    2622851.00
175      placement1     UNKNOWN  443.00
2        placement2     LARGE    598.00
59       placement2     MEDIUM   24358.00
118      placement2     SMALL    571802.00
176      placement2     UNKNOWN  1706.00
3        placement3     LARGE    8.00
60       placement3     MEDIUM   22.00  
119      placement3     SMALL    502388.00
177      placement3     UNKNOWN  762.00"

DF <- read.table(text = Lines, header = TRUE)

这是一个使用data.table

的选项
library(data.table)
setDT(df)[, PERCENTAGE := COST/SUM(COST) ,  by = PLACEMENT]