按R中另一列中的类别汇总一列的百分比
Aggregating percentage for one column by categories in another column in R
我知道这是基本的,但我遇到了问题。我从以下位置获取了示例数据:
Link to article containing sample data
companiesData <- data.frame(fy = c(2010,2011,2012,2010,2011,2012,2010,2011,2012),
company = c("Apple","Apple","Apple","Google","Google","Google",
"Microsoft","Microsoft","Microsoft"),
revenue = c(65225,108249,156508,29321,37905,50175,
62484,69943,73723),
profit = c(14013,25922,41733,8505,9737,10737,
18760,23150,16978))
我如何找到每家公司每年的利润百分比?一个例子是添加 Apple 的所有利润,然后适当地为每个苹果行添加此总和的百分比。最终结果应该是一个包含所有列的 table,但仅按公司使用利润百分比汇总。岁月依旧。
第一行 Apple 的答案为 17.16%,计算公式为:
(14013/81668)*100
其中 81668 是 apple 的总数,17.16% 是第一行 Apple 2010 年的利润百分比。我不希望将其作为时间序列完成,因为变量不一定是时间。可能是位置。
使用基数 r:
fun=function(x)paste0(round(x/sum(x)*100,2),"%")
transform(companiesData,prec=ave(profit,company,FUN=fun))
fy company revenue profit prec
1 2010 Apple 65225 14013 17.16%
2 2011 Apple 108249 25922 31.74%
3 2012 Apple 156508 41733 51.1%
4 2010 Google 29321 8505 29.35%
5 2011 Google 37905 9737 33.6%
6 2012 Google 50175 10737 37.05%
7 2010 Microsoft 62484 18760 31.86%
8 2011 Microsoft 69943 23150 39.31%
9 2012 Microsoft 73723 16978 28.83%
library(data.table)
setDT(companiesData)[,prec:=profit/sum(profit)*100,by=company][]
fy company revenue profit prec
1: 2010 Apple 65225 14013 17.15850
2: 2011 Apple 108249 25922 31.74071
3: 2012 Apple 156508 41733 51.10080
4: 2010 Google 29321 8505 29.34884
5: 2011 Google 37905 9737 33.60019
6: 2012 Google 50175 10737 37.05097
7: 2010 Microsoft 62484 18760 31.85708
8: 2011 Microsoft 69943 23150 39.31191
9: 2012 Microsoft 73723 16978 28.83100
dplyr
解决方法:按公司分组,将该公司的所有利润相加,然后创建一个新变量,即每年利润占总利润的份额。
library(dplyr)
# delete reading in data from OP
companiesData %>%
group_by(company) %>%
mutate(total_profit = sum(profit)) %>%
mutate(share_this_yr = profit / total_profit)
#> # A tibble: 9 x 6
#> # Groups: company [3]
#> fy company revenue profit total_profit share_this_yr
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 2010 Apple 65225 14013 81668 0.172
#> 2 2011 Apple 108249 25922 81668 0.317
#> 3 2012 Apple 156508 41733 81668 0.511
#> 4 2010 Google 29321 8505 28979 0.293
#> 5 2011 Google 37905 9737 28979 0.336
#> 6 2012 Google 50175 10737 28979 0.371
#> 7 2010 Microsoft 62484 18760 58888 0.319
#> 8 2011 Microsoft 69943 23150 58888 0.393
#> 9 2012 Microsoft 73723 16978 58888 0.288
由 reprex package (v0.2.0) 创建于 2018-05-01。
我知道这是基本的,但我遇到了问题。我从以下位置获取了示例数据:
Link to article containing sample data
companiesData <- data.frame(fy = c(2010,2011,2012,2010,2011,2012,2010,2011,2012),
company = c("Apple","Apple","Apple","Google","Google","Google",
"Microsoft","Microsoft","Microsoft"),
revenue = c(65225,108249,156508,29321,37905,50175,
62484,69943,73723),
profit = c(14013,25922,41733,8505,9737,10737,
18760,23150,16978))
我如何找到每家公司每年的利润百分比?一个例子是添加 Apple 的所有利润,然后适当地为每个苹果行添加此总和的百分比。最终结果应该是一个包含所有列的 table,但仅按公司使用利润百分比汇总。岁月依旧。 第一行 Apple 的答案为 17.16%,计算公式为:
(14013/81668)*100
其中 81668 是 apple 的总数,17.16% 是第一行 Apple 2010 年的利润百分比。我不希望将其作为时间序列完成,因为变量不一定是时间。可能是位置。
使用基数 r:
fun=function(x)paste0(round(x/sum(x)*100,2),"%")
transform(companiesData,prec=ave(profit,company,FUN=fun))
fy company revenue profit prec
1 2010 Apple 65225 14013 17.16%
2 2011 Apple 108249 25922 31.74%
3 2012 Apple 156508 41733 51.1%
4 2010 Google 29321 8505 29.35%
5 2011 Google 37905 9737 33.6%
6 2012 Google 50175 10737 37.05%
7 2010 Microsoft 62484 18760 31.86%
8 2011 Microsoft 69943 23150 39.31%
9 2012 Microsoft 73723 16978 28.83%
library(data.table)
setDT(companiesData)[,prec:=profit/sum(profit)*100,by=company][]
fy company revenue profit prec
1: 2010 Apple 65225 14013 17.15850
2: 2011 Apple 108249 25922 31.74071
3: 2012 Apple 156508 41733 51.10080
4: 2010 Google 29321 8505 29.34884
5: 2011 Google 37905 9737 33.60019
6: 2012 Google 50175 10737 37.05097
7: 2010 Microsoft 62484 18760 31.85708
8: 2011 Microsoft 69943 23150 39.31191
9: 2012 Microsoft 73723 16978 28.83100
dplyr
解决方法:按公司分组,将该公司的所有利润相加,然后创建一个新变量,即每年利润占总利润的份额。
library(dplyr)
# delete reading in data from OP
companiesData %>%
group_by(company) %>%
mutate(total_profit = sum(profit)) %>%
mutate(share_this_yr = profit / total_profit)
#> # A tibble: 9 x 6
#> # Groups: company [3]
#> fy company revenue profit total_profit share_this_yr
#> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 2010 Apple 65225 14013 81668 0.172
#> 2 2011 Apple 108249 25922 81668 0.317
#> 3 2012 Apple 156508 41733 81668 0.511
#> 4 2010 Google 29321 8505 28979 0.293
#> 5 2011 Google 37905 9737 28979 0.336
#> 6 2012 Google 50175 10737 28979 0.371
#> 7 2010 Microsoft 62484 18760 58888 0.319
#> 8 2011 Microsoft 69943 23150 58888 0.393
#> 9 2012 Microsoft 73723 16978 58888 0.288
由 reprex package (v0.2.0) 创建于 2018-05-01。