使用条件创建滞后变量并按 id 分组
Create lag variable with conditions and group by id
我正在努力创建一个新变量来捕获给定公司在财政年度 (fyear) 开始时的股票价格 (prc)。
在数据中,财政年度定义有开始日期和结束日期,并辅以每月股票价格。股票价格以当月最后一个交易日的价格为准,因此并不总是在当月的最后一天。
例如:财政年度从 2001 年 1 月 1 日开始,我想获得 2000 年 12 月末的股票价格。
这是数据示例:
dt <- data.table(id = rep(c(59328, 61241), each = 36), fyear = c(rep(2001,
each = 12), rep(2002, each = 12), rep(2003, each = 12), rep(2001,
each = 12), rep(2002, each = 12), rep(2003, each = 12)),
fyear_start = as.Date(c(rep("2001-01-01", each = 12), rep("2002-01-01",
each = 12), rep("2003-01-01", each = 12), rep("2000-07-01",
each = 12), rep("2001-07-01", each = 12), rep("2002-07-01",
each = 12))), fyear_end = as.Date(c(rep("2001-12-31",
each = 12), rep("2002-12-31", each = 12), rep("2003-12-31",
each = 12), rep("2001-06-30", each = 12), rep("2002-06-30",
each = 12), rep("2003-06-30", each = 12))), prc_month_end = as.Date(c("2001-01-31",
"2001-02-28", "2001-03-30", "2001-04-30", "2001-05-31",
"2001-06-29", "2001-07-31", "2001-08-31", "2001-09-28",
"2001-10-31", "2001-11-30", "2001-12-31", "2002-01-31",
"2002-02-28", "2002-03-28", "2002-04-30", "2002-05-31",
"2002-06-28", "2002-07-31", "2002-08-30", "2002-09-30",
"2002-10-31", "2002-11-29", "2002-12-31", "2003-01-31",
"2003-02-28", "2003-03-31", "2003-04-30", "2003-05-30",
"2003-06-30", "2003-07-31", "2003-08-29", "2003-09-30",
"2003-10-31", "2003-11-28", "2003-12-31", "2000-07-31",
"2000-08-31", "2000-09-29", "2000-10-31", "2000-11-30",
"2000-12-29", "2001-01-31", "2001-02-28", "2001-03-30",
"2001-04-30", "2001-05-31", "2001-06-29", "2001-07-31",
"2001-08-31", "2001-09-28", "2001-10-31", "2001-11-30",
"2001-12-31", "2002-01-31", "2002-02-28", "2002-03-28",
"2002-04-30", "2002-05-31", "2002-06-28", "2002-07-31",
"2002-08-30", "2002-09-30", "2002-10-31", "2002-11-29",
"2002-12-31", "2003-01-31", "2003-02-28", "2003-03-31",
"2003-04-30", "2003-05-30", "2003-06-30")), prc = c(37,
28.56, 26.31, 30.91, 27.01, 29.25, 29.81, 27.96, 20.44,
24.42, 32.66, 31.45, 35.04, 28.55, 30.41, 28.61, 27.62,
18.27, 18.79, 16.67, 13.89, 17.3, 20.88, 15.57, 15.7,
17.26, 16.28, 18.37, 20.82, 20.81, 24.89, 28.59, 27.52,
32.95, 33.54, 32.05, 24.6, 21.5, 26.54, 31, 28.25, 28.9,
18.26, 13.55, 8.15, 9.84, 13.56, 15.86, 16.05, 13.5,
14.71, 11.18, 11.43, 9.72, 8.03, 8.85, 5.34, 6.14, 9,
6.46, 5.24, 5.49, 6.18, 7.44, 7.28, 6.41, 7.3, 11.29,
11.11, 15.2, 17.97, 14.9))
前三行:
id fyear fyear_start fyear_end prc_month_end prc
1: 59328 2001 2001-01-01 2001-12-31 2001-01-31 37.00
2: 59328 2001 2001-01-01 2001-12-31 2001-02-28 28.56
3: 59328 2001 2001-01-01 2001-12-31 2001-03-30 26.31
我已阅读以下帖子以获得指导,但没有得到预期的结果。
- 使用thelatemail的解决方案,我可以做一个股票价格的滞后变量。但是,它取的是上个月的股价,并没有考虑会计年度。
vars <- c("prc")
rpv <- rep(1:2, each=length(vars))
dt_test <- dt[, paste(vars, "lag", rpv, sep="_") := Map(shift, .SD, rpv), by=id, .SDcols=vars]
- 同上,滞后变量股价以上月为准
- Create lead and lag variables in R
- 同上,滞后变量股价以上月为准
不能使用 data.table 的 .SD[1]/.N 语句,因为它 return 是会计年度的 first/last 月份,而不是最后一个上一财政年度。
有没有办法 return 一个财政年度的上一财政年度的最后一个月股票价格?
想要的结果如下:
output <- data.table(id = rep(c(59328, 61241), each = 3), fyear = c(2001,
2002, 2003, 2001, 2002, 2003), fyear_start = as.Date(c("2001-01-01",
"2002-01-01", "2003-01-01", "2000-07-01", "2001-07-01", "2002-07-01")),
fyear_end = as.Date(c("2001-12-31", "2002-12-31", "2003-12-31",
"2001-06-30", "2002-06-30", "2003-06-30")), begin_prc = c(NA,
31.45, 15.57, NA, 15.86, 6.46))
id fyear fyear_start fyear_end begin_prc
1: 59328 2001 2001-01-01 2001-12-31 NA
2: 59328 2002 2002-01-01 2002-12-31 31.45
3: 59328 2003 2003-01-01 2003-12-31 15.57
4: 61241 2001 2000-07-01 2001-06-30 NA
5: 61241 2002 2001-07-01 2002-06-30 15.86
6: 61241 2003 2002-07-01 2003-06-30 6.46
非常感谢您的帮助。提前致谢。
这适用于您的示例,但您需要 double-check 逻辑——我觉得有点老套。我稍后会重新访问并仔细考虑它。希望这能让你开始!
dt[, test := (shift(fyear_start, -1) - prc_month_end) > 0, by = id]
out <- dt[test == T | is.na(test)][, prc := shift(prc, 1), by = id]
out[, c("test", "prc_month_end") := NULL]
dt
id fyear fyear_start fyear_end prc
1: 59328 2001 2001-01-01 2001-12-31 NA
2: 59328 2002 2002-01-01 2002-12-31 31.45
3: 59328 2003 2003-01-01 2003-12-31 15.57
4: 61241 2001 2000-07-01 2001-06-30 NA
5: 61241 2002 2001-07-01 2002-06-30 15.86
6: 61241 2003 2002-07-01 2003-06-30 6.46
Is there a way to return for a fiscal year the last monthly stock price at previous fiscal year?
out = unique(dt[, .(id, fyear, fyear_start, fyear_end)])
out[, prc_end := {
dt[.(id = .SD$id, prc_month_end = .SD$fyear_start - 1L), on=.(id, prc_month_end), roll=TRUE, x.prc]
}]
id fyear fyear_start fyear_end prc_end
1: 59328 2001 2001-01-01 2001-12-31 NA
2: 59328 2002 2002-01-01 2002-12-31 31.45
3: 59328 2003 2003-01-01 2003-12-31 15.57
4: 61241 2001 2000-07-01 2001-06-30 NA
5: 61241 2002 2001-07-01 2002-06-30 15.86
6: 61241 2003 2002-07-01 2003-06-30 6.46
这是滚动更新连接:对于 table out
的行
- 使用
.SD = out
、数据子集 构造查找向量 .(id, fyear_start - 1)
- 查找
dt
的行,"rolling" 最后一个向量 fyear_start - 1
,到最近的较早日期
- 取
x.prc
的匹配值,dt
的 prc
列
符号 x.*
来自 x[i]
join/lookup 语法。有关详细信息,请参阅 ?data.table
。
我正在努力创建一个新变量来捕获给定公司在财政年度 (fyear) 开始时的股票价格 (prc)。
在数据中,财政年度定义有开始日期和结束日期,并辅以每月股票价格。股票价格以当月最后一个交易日的价格为准,因此并不总是在当月的最后一天。
例如:财政年度从 2001 年 1 月 1 日开始,我想获得 2000 年 12 月末的股票价格。
这是数据示例:
dt <- data.table(id = rep(c(59328, 61241), each = 36), fyear = c(rep(2001,
each = 12), rep(2002, each = 12), rep(2003, each = 12), rep(2001,
each = 12), rep(2002, each = 12), rep(2003, each = 12)),
fyear_start = as.Date(c(rep("2001-01-01", each = 12), rep("2002-01-01",
each = 12), rep("2003-01-01", each = 12), rep("2000-07-01",
each = 12), rep("2001-07-01", each = 12), rep("2002-07-01",
each = 12))), fyear_end = as.Date(c(rep("2001-12-31",
each = 12), rep("2002-12-31", each = 12), rep("2003-12-31",
each = 12), rep("2001-06-30", each = 12), rep("2002-06-30",
each = 12), rep("2003-06-30", each = 12))), prc_month_end = as.Date(c("2001-01-31",
"2001-02-28", "2001-03-30", "2001-04-30", "2001-05-31",
"2001-06-29", "2001-07-31", "2001-08-31", "2001-09-28",
"2001-10-31", "2001-11-30", "2001-12-31", "2002-01-31",
"2002-02-28", "2002-03-28", "2002-04-30", "2002-05-31",
"2002-06-28", "2002-07-31", "2002-08-30", "2002-09-30",
"2002-10-31", "2002-11-29", "2002-12-31", "2003-01-31",
"2003-02-28", "2003-03-31", "2003-04-30", "2003-05-30",
"2003-06-30", "2003-07-31", "2003-08-29", "2003-09-30",
"2003-10-31", "2003-11-28", "2003-12-31", "2000-07-31",
"2000-08-31", "2000-09-29", "2000-10-31", "2000-11-30",
"2000-12-29", "2001-01-31", "2001-02-28", "2001-03-30",
"2001-04-30", "2001-05-31", "2001-06-29", "2001-07-31",
"2001-08-31", "2001-09-28", "2001-10-31", "2001-11-30",
"2001-12-31", "2002-01-31", "2002-02-28", "2002-03-28",
"2002-04-30", "2002-05-31", "2002-06-28", "2002-07-31",
"2002-08-30", "2002-09-30", "2002-10-31", "2002-11-29",
"2002-12-31", "2003-01-31", "2003-02-28", "2003-03-31",
"2003-04-30", "2003-05-30", "2003-06-30")), prc = c(37,
28.56, 26.31, 30.91, 27.01, 29.25, 29.81, 27.96, 20.44,
24.42, 32.66, 31.45, 35.04, 28.55, 30.41, 28.61, 27.62,
18.27, 18.79, 16.67, 13.89, 17.3, 20.88, 15.57, 15.7,
17.26, 16.28, 18.37, 20.82, 20.81, 24.89, 28.59, 27.52,
32.95, 33.54, 32.05, 24.6, 21.5, 26.54, 31, 28.25, 28.9,
18.26, 13.55, 8.15, 9.84, 13.56, 15.86, 16.05, 13.5,
14.71, 11.18, 11.43, 9.72, 8.03, 8.85, 5.34, 6.14, 9,
6.46, 5.24, 5.49, 6.18, 7.44, 7.28, 6.41, 7.3, 11.29,
11.11, 15.2, 17.97, 14.9))
前三行:
id fyear fyear_start fyear_end prc_month_end prc
1: 59328 2001 2001-01-01 2001-12-31 2001-01-31 37.00
2: 59328 2001 2001-01-01 2001-12-31 2001-02-28 28.56
3: 59328 2001 2001-01-01 2001-12-31 2001-03-30 26.31
我已阅读以下帖子以获得指导,但没有得到预期的结果。
- 使用thelatemail的解决方案,我可以做一个股票价格的滞后变量。但是,它取的是上个月的股价,并没有考虑会计年度。
vars <- c("prc") rpv <- rep(1:2, each=length(vars)) dt_test <- dt[, paste(vars, "lag", rpv, sep="_") := Map(shift, .SD, rpv), by=id, .SDcols=vars]
- 同上,滞后变量股价以上月为准
- Create lead and lag variables in R
- 同上,滞后变量股价以上月为准
不能使用 data.table 的 .SD[1]/.N 语句,因为它 return 是会计年度的 first/last 月份,而不是最后一个上一财政年度。
有没有办法 return 一个财政年度的上一财政年度的最后一个月股票价格?
想要的结果如下:
output <- data.table(id = rep(c(59328, 61241), each = 3), fyear = c(2001,
2002, 2003, 2001, 2002, 2003), fyear_start = as.Date(c("2001-01-01",
"2002-01-01", "2003-01-01", "2000-07-01", "2001-07-01", "2002-07-01")),
fyear_end = as.Date(c("2001-12-31", "2002-12-31", "2003-12-31",
"2001-06-30", "2002-06-30", "2003-06-30")), begin_prc = c(NA,
31.45, 15.57, NA, 15.86, 6.46))
id fyear fyear_start fyear_end begin_prc
1: 59328 2001 2001-01-01 2001-12-31 NA
2: 59328 2002 2002-01-01 2002-12-31 31.45
3: 59328 2003 2003-01-01 2003-12-31 15.57
4: 61241 2001 2000-07-01 2001-06-30 NA
5: 61241 2002 2001-07-01 2002-06-30 15.86
6: 61241 2003 2002-07-01 2003-06-30 6.46
非常感谢您的帮助。提前致谢。
这适用于您的示例,但您需要 double-check 逻辑——我觉得有点老套。我稍后会重新访问并仔细考虑它。希望这能让你开始!
dt[, test := (shift(fyear_start, -1) - prc_month_end) > 0, by = id]
out <- dt[test == T | is.na(test)][, prc := shift(prc, 1), by = id]
out[, c("test", "prc_month_end") := NULL]
dt
id fyear fyear_start fyear_end prc
1: 59328 2001 2001-01-01 2001-12-31 NA
2: 59328 2002 2002-01-01 2002-12-31 31.45
3: 59328 2003 2003-01-01 2003-12-31 15.57
4: 61241 2001 2000-07-01 2001-06-30 NA
5: 61241 2002 2001-07-01 2002-06-30 15.86
6: 61241 2003 2002-07-01 2003-06-30 6.46
Is there a way to return for a fiscal year the last monthly stock price at previous fiscal year?
out = unique(dt[, .(id, fyear, fyear_start, fyear_end)])
out[, prc_end := {
dt[.(id = .SD$id, prc_month_end = .SD$fyear_start - 1L), on=.(id, prc_month_end), roll=TRUE, x.prc]
}]
id fyear fyear_start fyear_end prc_end
1: 59328 2001 2001-01-01 2001-12-31 NA
2: 59328 2002 2002-01-01 2002-12-31 31.45
3: 59328 2003 2003-01-01 2003-12-31 15.57
4: 61241 2001 2000-07-01 2001-06-30 NA
5: 61241 2002 2001-07-01 2002-06-30 15.86
6: 61241 2003 2002-07-01 2003-06-30 6.46
这是滚动更新连接:对于 table out
- 使用
.SD = out
、数据子集 构造查找向量 - 查找
dt
的行,"rolling" 最后一个向量fyear_start - 1
,到最近的较早日期 - 取
x.prc
的匹配值,dt
的
.(id, fyear_start - 1)
prc
列
符号 x.*
来自 x[i]
join/lookup 语法。有关详细信息,请参阅 ?data.table
。