R:查找数据框的每个子集的斜率
R: Find slope for each subset of a data frame
我有这样的数据:
dat <- data.frame(ID=sample(1:10, 100, rep=T),
Date=seq(as.Date("1982/01/01"), by="16 days", length.out = 100),
Value1=runif(100))
我需要按年份和 ID 对数据进行子集化,并用一条线拟合 1 月至 6 月和 7 月至 12 月的数据,并写出 2 个斜率系数,我需要对年份和 ID 的所有组合执行此操作.
除了循环,还有其他方法吗?实际数据有21788928行,循环时间过长
这应该会更快,但我不确定它是否足够快以满足您的需要。让我知道:
library(dplyr)
library(lubridate)
# Function to return the coefficients of the regression as a data frame
coef.fcn = function(df) {
coeffs = coef(lm(Value1 ~ Date, data=df))
return(data.frame(Intercept=coeffs[1], Value1=coeffs[2]))
}
lm_coefs = dat %>%
mutate(my.cat = ifelse(month(Date) %in% 1:6,
paste("Jan-Jun", year(Date)), paste("Jul-Dec", year(Date)))) %>%
group_by(ID, my.cat) %>%
do(coef.fcn(.))
这是您的示例数据的部分结果:
lm.coefs
ID my.cat Intercept Value1
1 1 Jan-Jun 1983 0.62824396 NA
2 1 Jan-Jun 1985 0.71865235 NA
3 1 Jul-Dec 1985 20.20901291 -0.0033972977
4 2 Jan-Jun 1983 -37.54324401 0.0078885381
...
45 8 Jan-Jun 1982 -30.39203349 0.0068229828
46 8 Jan-Jun 1984 -27.62517465 0.0054096259
47 8 Jan-Jun 1985 27.70049296 -0.0048539844
48 8 Jul-Dec 1982 12.90814643 -0.0025997511
49 8 Jul-Dec 1984 -16.84585961 0.0032229997
...
57 10 Jan-Jun 1982 0.63533344 NA
58 10 Jan-Jun 1983 0.35107513 NA
59 10 Jan-Jun 1984 0.59588750 NA
60 10 Jul-Dec 1982 0.05156481 NA
61 10 Jul-Dec 1983 0.54658810 NA
我有这样的数据:
dat <- data.frame(ID=sample(1:10, 100, rep=T),
Date=seq(as.Date("1982/01/01"), by="16 days", length.out = 100),
Value1=runif(100))
我需要按年份和 ID 对数据进行子集化,并用一条线拟合 1 月至 6 月和 7 月至 12 月的数据,并写出 2 个斜率系数,我需要对年份和 ID 的所有组合执行此操作.
除了循环,还有其他方法吗?实际数据有21788928行,循环时间过长
这应该会更快,但我不确定它是否足够快以满足您的需要。让我知道:
library(dplyr)
library(lubridate)
# Function to return the coefficients of the regression as a data frame
coef.fcn = function(df) {
coeffs = coef(lm(Value1 ~ Date, data=df))
return(data.frame(Intercept=coeffs[1], Value1=coeffs[2]))
}
lm_coefs = dat %>%
mutate(my.cat = ifelse(month(Date) %in% 1:6,
paste("Jan-Jun", year(Date)), paste("Jul-Dec", year(Date)))) %>%
group_by(ID, my.cat) %>%
do(coef.fcn(.))
这是您的示例数据的部分结果:
lm.coefs
ID my.cat Intercept Value1
1 1 Jan-Jun 1983 0.62824396 NA
2 1 Jan-Jun 1985 0.71865235 NA
3 1 Jul-Dec 1985 20.20901291 -0.0033972977
4 2 Jan-Jun 1983 -37.54324401 0.0078885381
...
45 8 Jan-Jun 1982 -30.39203349 0.0068229828
46 8 Jan-Jun 1984 -27.62517465 0.0054096259
47 8 Jan-Jun 1985 27.70049296 -0.0048539844
48 8 Jul-Dec 1982 12.90814643 -0.0025997511
49 8 Jul-Dec 1984 -16.84585961 0.0032229997
...
57 10 Jan-Jun 1982 0.63533344 NA
58 10 Jan-Jun 1983 0.35107513 NA
59 10 Jan-Jun 1984 0.59588750 NA
60 10 Jul-Dec 1982 0.05156481 NA
61 10 Jul-Dec 1983 0.54658810 NA