引用与特定因子值关联的值的函数

Function for referencing values associated with specific factor values

我有一个相当大的列表,看起来像这样,其中我存储的前两个变量是因子

Product Vendor   Sales    Product sales share
a       x          100    
b       y          200     
a       y          250     
c       y          700  
a       z          150

理想情况下,我想创建一个新列,其中包含供应商在该产品总销售额中的份额,即 Share_{p=a,v=x} = 100/(100+250+150)

我认为 lapply() 是可行的,但不确定如何编写函数

> dput(list)
list(structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a", 
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L, 
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100, 
200, 250, 700, 150)), class = "data.frame", row.names = c(NA, 
-5L)))

使用 dplyr 包,您可以计算每个产品的总销售额,然后根据单个供应商和总销售额计算供应商份额。

library(dplyr)

df %>%
  group_by(Product) %>%
  mutate(Total_Sales = sum(Sales),
         Vendor_Share = Sales/Total_Sales)

基础 R 方法可以使用 prop.table 作为替代方法:

df$Vendor_Share <- with(df, ave(Sales, Product, FUN = prop.table))

输出

  Product Vendor Sales Vendor_Share
1       a      x   100          0.2
2       b      y   200          1.0
3       a      y   250          0.5
4       c      y   700          1.0
5       a      z   150          0.3

数据

df <- structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a", 
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L, 
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100, 
200, 250, 700, 150), Vendor_Share = c(0.2, 1, 0.5, 1, 0.3)), row.names = c(NA, 
-5L), class = "data.frame")