在 R 中按组创建总和滞后变量
Create summed lagged variable by group in R
如有任何帮助,我们将不胜感激!
本质上,我需要一个变量,在考虑日期变量的同时按组对先前观察的数量求和。
例如:
my current data:
ID <- c("A", "A", "A","A", "B", "B", "B")
YEAR <- c(1900, 1901, 1902, 1903, 1900, 1901, 1902)
CASH <- c(1, 2, 3, 1, 0, 1, 0)
DF <- data.frame(ID, YEAR, CASH)
print(DF)
what I would like my data to look like:
ID <- c("A", "A", "A","A", "B", "B", "B")
YEAR <- c(1900, 1901, 1902, 1903, 1900, 1901, 1902)
CASH <- c(1, 2, 3, 1, 0, 1, 0)
PREV_CASH <- c(NA, 1, 3, 6, NA, NA, 1)
DF2 <- data.frame(ID, YEAR, CASH, PREV_CASH)
print(DF2)
我想对每组上一年的现金金额求和。
按'ID'
分组后,我们可以使用'CASH'的cumsum
的lag
library(dplyr)
DF %>%
group_by(ID) %>%
mutate(PREV_CASH = lag(cumsum(CASH)), PREV_CASH = replace(PREV_CASH, PREV_CASH==0, NA))
# ID YEAR CASH PREV_CASH
# <fctr> <dbl> <dbl> <dbl>
#1 A 1900 1 NA
#2 A 1901 2 1
#3 A 1902 3 3
#4 A 1903 1 6
#5 B 1900 0 NA
#6 B 1901 1 NA
#7 B 1902 0 1
如有任何帮助,我们将不胜感激!
本质上,我需要一个变量,在考虑日期变量的同时按组对先前观察的数量求和。
例如:
my current data:
ID <- c("A", "A", "A","A", "B", "B", "B")
YEAR <- c(1900, 1901, 1902, 1903, 1900, 1901, 1902)
CASH <- c(1, 2, 3, 1, 0, 1, 0)
DF <- data.frame(ID, YEAR, CASH)
print(DF)
what I would like my data to look like:
ID <- c("A", "A", "A","A", "B", "B", "B")
YEAR <- c(1900, 1901, 1902, 1903, 1900, 1901, 1902)
CASH <- c(1, 2, 3, 1, 0, 1, 0)
PREV_CASH <- c(NA, 1, 3, 6, NA, NA, 1)
DF2 <- data.frame(ID, YEAR, CASH, PREV_CASH)
print(DF2)
我想对每组上一年的现金金额求和。
按'ID'
分组后,我们可以使用'CASH'的cumsum
的lag
library(dplyr)
DF %>%
group_by(ID) %>%
mutate(PREV_CASH = lag(cumsum(CASH)), PREV_CASH = replace(PREV_CASH, PREV_CASH==0, NA))
# ID YEAR CASH PREV_CASH
# <fctr> <dbl> <dbl> <dbl>
#1 A 1900 1 NA
#2 A 1901 2 1
#3 A 1902 3 3
#4 A 1903 1 6
#5 B 1900 0 NA
#6 B 1901 1 NA
#7 B 1902 0 1