如何计算 r 的滞后投资
how to compute lagged investment in r
我创建了df1
year gvkey capex ppent
2004 1004 13.033 139.137
2005 1004 16.296 213.380
2006 1004 29.891 260.167
2007 1004 30.334 310.393
2008 1004 27.535 245.586
2009 1004 28.855 334.430
...
我创建了df2
year gvkey ROA
2005 1004 0.02796478
2006 1004 0.04665171
2007 1004 0.05976127
2008 1004 0.06255035
2009 1004 0.03549220
2005 1013 0.06882688
...
我想创建df3
year gvkey ROA lag_investment
2005 1004 0.02796478 capex from 2004 / ppent from 2004
2006 1004 0.04665171 capex from 2005 / ppent from 2005
2007 1004 0.05976127 capex from 2006 / ppent from 2006
2008 1004 0.06255035 capex from 2007 / ppent from 2007
2009 1004 0.03549220 capex from 2008 / ppent from 2008
2005 1013 0.06882688 capex from 2004 / ppent from 2004
...
我有2000多家公司年。 gvkey = firm id
我基本上想做的是:
1) 根据df1
计算上一年的投资
2) 在 df2
中创建一个名为 "lag_investment" 的列
2) 在 df2
的当前年份行中插入步骤 1) 中的值
补充问题:
如果我想执行以下操作,代码会是什么样子?
我创建了df1
year gvkey ROA ppent capex
1 2004 1004 0.01320911 139.137 13.033
2 2005 1004 0.03005708 213.380 16.296
3 2006 1004 0.05014214 260.167 29.891
4 2007 1004 0.06423255 310.393 30.334
5 2008 1004 0.06723031 245.586 27.535
6 2009 1004 0.03814769 334.430 28.855
...
我想给df1
添加一个变量
year gvkey ROA ppent capex lag_investment
1 2004 1004 0.01320911 139.137 13.033
2 2005 1004 0.03005708 213.380 16.296 capex from 2004 / ppent from 2004
3 2006 1004 0.05014214 260.167 29.891 capex from 2005 / ppent from 2005
4 2007 1004 0.06423255 310.393 30.334 capex from 2006 / ppent from 2006
5 2008 1004 0.06723031 245.586 27.535 capex from 2007 / ppent from 2007
6 2009 1004 0.03814769 334.430 28.855 capex from 2008 / ppent from 2008
...
我想计算除 2004 年以外所有年份的 lag_investment。
非常感谢!!!
我猜你可以在 dplyr
中使用 lag
library(dplyr)
df1 %>% mutate(lag_investment = lag(capex)/lag(ppent))
# year gvkey ROA ppent capex lag_investment
#1 2004 1004 0.0132 139 13.0 NA
#2 2005 1004 0.0301 213 16.3 0.0937
#3 2006 1004 0.0501 260 29.9 0.0764
#4 2007 1004 0.0642 310 30.3 0.1149
#5 2008 1004 0.0672 246 27.5 0.0977
#6 2009 1004 0.0381 334 28.9 0.1121
如果dataframe没有排序,先用arrange
按年排序。
df1 %>% arrange(year) %>% mutate(lag_investment = lag(capex)/lag(ppent))
或 shift
在 data.table
library(data.table)
setDT(df1)[, lag_investment := shift(capex)/shift(ppent)]
有了data.table
,我们可以做到
library(data.table)
setDT(df1)[, lag_investment :=Reduce(`/`, shift(.SD)), .SDcols = c("capex", "ppent")]
df1
# year gvkey ROA ppent capex lag_investment
#1: 2004 1004 0.01320911 139.137 13.033 NA
#2: 2005 1004 0.03005708 213.380 16.296 0.09367027
#3: 2006 1004 0.05014214 260.167 29.891 0.07637079
#4: 2007 1004 0.06423255 310.393 30.334 0.11489159
#5: 2008 1004 0.06723031 245.586 27.535 0.09772772
#6: 2009 1004 0.03814769 334.430 28.855 0.11211958
或在base R
df1$lag_investment <- with(df1, c(NA, head(capex, -1)/head(ppent, -1)))
或者可以写成
df1$lag_investment <- with(df1, c(NA, capex[-nrow(df1)]/ppent[-nrow(df1)]))
数据
df1 <- structure(list(year = 2004:2009, gvkey = c(1004L, 1004L, 1004L,
1004L, 1004L, 1004L), ROA = c(0.01320911, 0.03005708, 0.05014214,
0.06423255, 0.06723031, 0.03814769), ppent = c(139.137, 213.38,
260.167, 310.393, 245.586, 334.43), capex = c(13.033, 16.296,
29.891, 30.334, 27.535, 28.855)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6"))
我创建了df1
year gvkey capex ppent
2004 1004 13.033 139.137
2005 1004 16.296 213.380
2006 1004 29.891 260.167
2007 1004 30.334 310.393
2008 1004 27.535 245.586
2009 1004 28.855 334.430
...
我创建了df2
year gvkey ROA
2005 1004 0.02796478
2006 1004 0.04665171
2007 1004 0.05976127
2008 1004 0.06255035
2009 1004 0.03549220
2005 1013 0.06882688
...
我想创建df3
year gvkey ROA lag_investment
2005 1004 0.02796478 capex from 2004 / ppent from 2004
2006 1004 0.04665171 capex from 2005 / ppent from 2005
2007 1004 0.05976127 capex from 2006 / ppent from 2006
2008 1004 0.06255035 capex from 2007 / ppent from 2007
2009 1004 0.03549220 capex from 2008 / ppent from 2008
2005 1013 0.06882688 capex from 2004 / ppent from 2004
...
我有2000多家公司年。 gvkey = firm id
我基本上想做的是:
1) 根据df1
2) 在 df2
2) 在 df2
补充问题:
如果我想执行以下操作,代码会是什么样子?
我创建了df1
year gvkey ROA ppent capex
1 2004 1004 0.01320911 139.137 13.033
2 2005 1004 0.03005708 213.380 16.296
3 2006 1004 0.05014214 260.167 29.891
4 2007 1004 0.06423255 310.393 30.334
5 2008 1004 0.06723031 245.586 27.535
6 2009 1004 0.03814769 334.430 28.855
...
我想给df1
year gvkey ROA ppent capex lag_investment
1 2004 1004 0.01320911 139.137 13.033
2 2005 1004 0.03005708 213.380 16.296 capex from 2004 / ppent from 2004
3 2006 1004 0.05014214 260.167 29.891 capex from 2005 / ppent from 2005
4 2007 1004 0.06423255 310.393 30.334 capex from 2006 / ppent from 2006
5 2008 1004 0.06723031 245.586 27.535 capex from 2007 / ppent from 2007
6 2009 1004 0.03814769 334.430 28.855 capex from 2008 / ppent from 2008
...
我想计算除 2004 年以外所有年份的 lag_investment。
非常感谢!!!
我猜你可以在 dplyr
lag
library(dplyr)
df1 %>% mutate(lag_investment = lag(capex)/lag(ppent))
# year gvkey ROA ppent capex lag_investment
#1 2004 1004 0.0132 139 13.0 NA
#2 2005 1004 0.0301 213 16.3 0.0937
#3 2006 1004 0.0501 260 29.9 0.0764
#4 2007 1004 0.0642 310 30.3 0.1149
#5 2008 1004 0.0672 246 27.5 0.0977
#6 2009 1004 0.0381 334 28.9 0.1121
如果dataframe没有排序,先用arrange
按年排序。
df1 %>% arrange(year) %>% mutate(lag_investment = lag(capex)/lag(ppent))
或 shift
在 data.table
library(data.table)
setDT(df1)[, lag_investment := shift(capex)/shift(ppent)]
有了data.table
,我们可以做到
library(data.table)
setDT(df1)[, lag_investment :=Reduce(`/`, shift(.SD)), .SDcols = c("capex", "ppent")]
df1
# year gvkey ROA ppent capex lag_investment
#1: 2004 1004 0.01320911 139.137 13.033 NA
#2: 2005 1004 0.03005708 213.380 16.296 0.09367027
#3: 2006 1004 0.05014214 260.167 29.891 0.07637079
#4: 2007 1004 0.06423255 310.393 30.334 0.11489159
#5: 2008 1004 0.06723031 245.586 27.535 0.09772772
#6: 2009 1004 0.03814769 334.430 28.855 0.11211958
或在base R
df1$lag_investment <- with(df1, c(NA, head(capex, -1)/head(ppent, -1)))
或者可以写成
df1$lag_investment <- with(df1, c(NA, capex[-nrow(df1)]/ppent[-nrow(df1)]))
数据
df1 <- structure(list(year = 2004:2009, gvkey = c(1004L, 1004L, 1004L,
1004L, 1004L, 1004L), ROA = c(0.01320911, 0.03005708, 0.05014214,
0.06423255, 0.06723031, 0.03814769), ppent = c(139.137, 213.38,
260.167, 310.393, 245.586, 334.43), capex = c(13.033, 16.296,
29.891, 30.334, 27.535, 28.855)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6"))