R中的ddply转换(百分比变化)
ddply transformation (percentage change) in R
我有 data.frame
看起来像这样:
Brand Year EUR
Brand1 2015 10
Brand1 2016 20
Brand2 2015 100
Brand2 2016 500
Brand3 2015 25
Brand4 2015 455
...
此外,我附上下面的代码:
library(plyr)
library(dplyr)
library(scales)
set.seed(1992)
n=68
Year <- sample(c("2015", "2016"), n, replace = TRUE, prob = NULL)
Brand <- sample("Brand", n, replace = TRUE, prob = NULL)
Brand <- paste0(Brand, sample(1:5, n, replace = TRUE, prob = NULL))
EUR <- abs(rnorm(n))*100000
df <- data.frame(Year, Brand, EUR)
我需要一些额外的数据转换(添加更多列)用于我未来的研究。
首先,我计算标签的位置(用于我未来的图表)并将其命名为 pos
:
df.summary = df %>% group_by(Brand, Year) %>%
summarise(EUR = sum(EUR)) %>% #
mutate( pos = cumsum(EUR)-0.5*EUR)
我想做的是,根据 Year
计算每个 Brand
的 percentage grow
。所以我添加了这一行:
df.summary = ddply(df.summary, .(Brand), transform,
pChange = (sum(df.summary[df.summary$Year == "2016",]$EUR)/
sum(df.summary[df.summary$Year == "2015",]$EUR) )-1
)
但是,我得到的是恒定大小 - 我所有数据框的增长。
你能帮我计算每个品牌的百分比变化吗?
谢谢!
此外,如果您使用 lag
:
会更容易
df.summary %>% group_by(Brand) %>%
mutate(pChange = (EUR - lag(EUR))/lag(EUR) * 100)
# Source: local data frame [10 x 5]
#Groups: Brand [5]
#
# Brand Year EUR pos pChange
# <fctr> <fctr> <dbl> <dbl> <dbl>
#1 Brand1 2015 637896.7 318948.3 NA
#2 Brand1 2016 721944.2 998868.8 13.17573
#3 Brand2 2015 708697.6 354348.8 NA
#4 Brand2 2016 300541.1 858968.2 -57.59248
#5 Brand3 2015 454890.1 227445.1 NA
#6 Brand3 2016 576095.6 742937.9 26.64500
#7 Brand4 2015 305712.0 152856.0 NA
#8 Brand4 2016 174073.3 392748.6 -43.05970
#9 Brand5 2015 589970.7 294985.3 NA
#10 Brand5 2016 518510.2 849225.8 -12.11254
按照@r2evans的建议,如果Year
没有事先安排好,
df.summary %>% group_by(Brand) %>% arrange(Year) %>%
mutate(pChange = (EUR - lag(EUR))/lag(EUR) * 100)
我有 data.frame
看起来像这样:
Brand Year EUR
Brand1 2015 10
Brand1 2016 20
Brand2 2015 100
Brand2 2016 500
Brand3 2015 25
Brand4 2015 455
...
此外,我附上下面的代码:
library(plyr)
library(dplyr)
library(scales)
set.seed(1992)
n=68
Year <- sample(c("2015", "2016"), n, replace = TRUE, prob = NULL)
Brand <- sample("Brand", n, replace = TRUE, prob = NULL)
Brand <- paste0(Brand, sample(1:5, n, replace = TRUE, prob = NULL))
EUR <- abs(rnorm(n))*100000
df <- data.frame(Year, Brand, EUR)
我需要一些额外的数据转换(添加更多列)用于我未来的研究。
首先,我计算标签的位置(用于我未来的图表)并将其命名为 pos
:
df.summary = df %>% group_by(Brand, Year) %>%
summarise(EUR = sum(EUR)) %>% #
mutate( pos = cumsum(EUR)-0.5*EUR)
我想做的是,根据 Year
计算每个 Brand
的 percentage grow
。所以我添加了这一行:
df.summary = ddply(df.summary, .(Brand), transform,
pChange = (sum(df.summary[df.summary$Year == "2016",]$EUR)/
sum(df.summary[df.summary$Year == "2015",]$EUR) )-1
)
但是,我得到的是恒定大小 - 我所有数据框的增长。
你能帮我计算每个品牌的百分比变化吗?
谢谢!
此外,如果您使用 lag
:
df.summary %>% group_by(Brand) %>%
mutate(pChange = (EUR - lag(EUR))/lag(EUR) * 100)
# Source: local data frame [10 x 5]
#Groups: Brand [5]
#
# Brand Year EUR pos pChange
# <fctr> <fctr> <dbl> <dbl> <dbl>
#1 Brand1 2015 637896.7 318948.3 NA
#2 Brand1 2016 721944.2 998868.8 13.17573
#3 Brand2 2015 708697.6 354348.8 NA
#4 Brand2 2016 300541.1 858968.2 -57.59248
#5 Brand3 2015 454890.1 227445.1 NA
#6 Brand3 2016 576095.6 742937.9 26.64500
#7 Brand4 2015 305712.0 152856.0 NA
#8 Brand4 2016 174073.3 392748.6 -43.05970
#9 Brand5 2015 589970.7 294985.3 NA
#10 Brand5 2016 518510.2 849225.8 -12.11254
按照@r2evans的建议,如果Year
没有事先安排好,
df.summary %>% group_by(Brand) %>% arrange(Year) %>%
mutate(pChange = (EUR - lag(EUR))/lag(EUR) * 100)