引用与特定因子值关联的值的函数
Function for referencing values associated with specific factor values
我有一个相当大的列表,看起来像这样,其中我存储的前两个变量是因子
Product Vendor Sales Product sales share
a x 100
b y 200
a y 250
c y 700
a z 150
理想情况下,我想创建一个新列,其中包含供应商在该产品总销售额中的份额,即 Share_{p=a,v=x} = 100/(100+250+150)
我认为 lapply() 是可行的,但不确定如何编写函数
> dput(list)
list(structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L,
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100,
200, 250, 700, 150)), class = "data.frame", row.names = c(NA,
-5L)))
使用 dplyr
包,您可以计算每个产品的总销售额,然后根据单个供应商和总销售额计算供应商份额。
library(dplyr)
df %>%
group_by(Product) %>%
mutate(Total_Sales = sum(Sales),
Vendor_Share = Sales/Total_Sales)
基础 R 方法可以使用 prop.table
作为替代方法:
df$Vendor_Share <- with(df, ave(Sales, Product, FUN = prop.table))
输出
Product Vendor Sales Vendor_Share
1 a x 100 0.2
2 b y 200 1.0
3 a y 250 0.5
4 c y 700 1.0
5 a z 150 0.3
数据
df <- structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L,
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100,
200, 250, 700, 150), Vendor_Share = c(0.2, 1, 0.5, 1, 0.3)), row.names = c(NA,
-5L), class = "data.frame")
我有一个相当大的列表,看起来像这样,其中我存储的前两个变量是因子
Product Vendor Sales Product sales share
a x 100
b y 200
a y 250
c y 700
a z 150
理想情况下,我想创建一个新列,其中包含供应商在该产品总销售额中的份额,即 Share_{p=a,v=x} = 100/(100+250+150)
我认为 lapply() 是可行的,但不确定如何编写函数
> dput(list)
list(structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L,
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100,
200, 250, 700, 150)), class = "data.frame", row.names = c(NA,
-5L)))
使用 dplyr
包,您可以计算每个产品的总销售额,然后根据单个供应商和总销售额计算供应商份额。
library(dplyr)
df %>%
group_by(Product) %>%
mutate(Total_Sales = sum(Sales),
Vendor_Share = Sales/Total_Sales)
基础 R 方法可以使用 prop.table
作为替代方法:
df$Vendor_Share <- with(df, ave(Sales, Product, FUN = prop.table))
输出
Product Vendor Sales Vendor_Share
1 a x 100 0.2
2 b y 200 1.0
3 a y 250 0.5
4 c y 700 1.0
5 a z 150 0.3
数据
df <- structure(list(Product = structure(c(1L, 2L, 1L, 3L, 1L), .Label = c("a",
"b", "c"), class = "factor"), Vendor = structure(c(1L, 2L, 2L,
2L, 3L), .Label = c("x", "y", "z"), class = "factor"), Sales = c(100,
200, 250, 700, 150), Vendor_Share = c(0.2, 1, 0.5, 1, 0.3)), row.names = c(NA,
-5L), class = "data.frame")