按R中另一列中的类别汇总一列的百分比

Aggregating percentage for one column by categories in another column in R

我知道这是基本的,但我遇到了问题。我从以下位置获取了示例数据:

Link to article containing sample data

companiesData <- data.frame(fy = c(2010,2011,2012,2010,2011,2012,2010,2011,2012),
                            company = c("Apple","Apple","Apple","Google","Google","Google",
                                        "Microsoft","Microsoft","Microsoft"),
                            revenue = c(65225,108249,156508,29321,37905,50175,
                                        62484,69943,73723), 
                            profit = c(14013,25922,41733,8505,9737,10737,
                                       18760,23150,16978))

我如何找到每家公司每年的利润百分比?一个例子是添加 Apple 的所有利润,然后适当地为每个苹果行添加此总和的百分比。最终结果应该是一个包含所有列的 table,但仅按公司使用利润百分比汇总。岁月依旧。 第一行 Apple 的答案为 17.16%,计算公式为:

(14013/81668)*100

其中 81668 是 apple 的总数,17.16% 是第一行 Apple 2010 年的利润百分比。我不希望将其作为时间序列完成,因为变量不一定是时间。可能是位置。

使用基数 r:

fun=function(x)paste0(round(x/sum(x)*100,2),"%")
transform(companiesData,prec=ave(profit,company,FUN=fun))
    fy   company revenue profit   prec
1 2010     Apple   65225  14013 17.16%
2 2011     Apple  108249  25922 31.74%
3 2012     Apple  156508  41733  51.1%
4 2010    Google   29321   8505 29.35%
5 2011    Google   37905   9737  33.6%
6 2012    Google   50175  10737 37.05%
7 2010 Microsoft   62484  18760 31.86%
8 2011 Microsoft   69943  23150 39.31%
9 2012 Microsoft   73723  16978 28.83%


library(data.table)
setDT(companiesData)[,prec:=profit/sum(profit)*100,by=company][]
     fy   company revenue profit     prec
1: 2010     Apple   65225  14013 17.15850
2: 2011     Apple  108249  25922 31.74071
3: 2012     Apple  156508  41733 51.10080
4: 2010    Google   29321   8505 29.34884
5: 2011    Google   37905   9737 33.60019
6: 2012    Google   50175  10737 37.05097
7: 2010 Microsoft   62484  18760 31.85708
8: 2011 Microsoft   69943  23150 39.31191
9: 2012 Microsoft   73723  16978 28.83100

dplyr解决方法:按公司分组,将该公司的所有利润相加,然后创建一个新变量,即每年利润占总利润的份额。

library(dplyr)

# delete reading in data from OP

companiesData %>%
    group_by(company) %>%
    mutate(total_profit = sum(profit)) %>%
    mutate(share_this_yr = profit / total_profit)
#> # A tibble: 9 x 6
#> # Groups:   company [3]
#>      fy company   revenue profit total_profit share_this_yr
#>   <dbl> <fct>       <dbl>  <dbl>        <dbl>         <dbl>
#> 1  2010 Apple       65225  14013        81668         0.172
#> 2  2011 Apple      108249  25922        81668         0.317
#> 3  2012 Apple      156508  41733        81668         0.511
#> 4  2010 Google      29321   8505        28979         0.293
#> 5  2011 Google      37905   9737        28979         0.336
#> 6  2012 Google      50175  10737        28979         0.371
#> 7  2010 Microsoft   62484  18760        58888         0.319
#> 8  2011 Microsoft   69943  23150        58888         0.393
#> 9  2012 Microsoft   73723  16978        58888         0.288

reprex package (v0.2.0) 创建于 2018-05-01。