获得R中连续点之间线性模型的拟合
Get the fit of linear models between consecutive points in R
我有一个我无法解决的任务:我有一组不同日期的值,并且希望通过在点之间创建线性模型并提取拟合来获得这些日期之间的值。这将很有用,因为我有另一个数据集需要根据日期分配此值。合适后,将使用滚动连接分配这些(这部分已经可以正常工作)。
这是一个例子和我尝试过的:
dt1 <- read.table(text ="Date,Measure
2019-02-13 11:11:00,728.2172
2019-07-09 11:11:00,738.4000
2019-08-06 11:11:00,743.8530
2019-02-13 11:11:00,728.2100
2019-07-09 11:11:00,738.4000
2019-08-06 11:11:00,743.8500
2019-12-11 11:11:00,696.4650
2020-03-02 11:11:00,715.5200
2020-04-30 11:11:00,721.1650
2020-08-25 11:11:00,740.0000", header = T, sep=",")
str(dt1)
dt1$Date<-as.POSIXct(dt1$Date,origin = "1970-01-01", tz = "GMT")
p0<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point() +geom_line()+
labs(x="Date",y="Values")+
scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p0
plot of data sample
我找到的最接近的答案是:Method to extract stat_smooth line fit
根据该建议,第一种方法(使用 ggplot_build(p1)):
p1<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point()+
geom_smooth(method = "loess", span=0.4)+
labs(x="Date",y="Values")+
scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p1
ggplot_build(p1)
fitdt1<- ggplot_build(p1)$data[[2]]
fitdt1$x<-as.POSIXct(fitdt1$x,origin = "1970-01-01", tz = "GMT")
p2<-ggplot(data=fitdt1, aes(x = x, y = y))+
geom_point()
p2
...只给了80分,不够精确:
Plot of ggplot_build fit
因此我能够手动创建模型并决定产生多少点(即使某些数据集有一些警告错误):
dt1$Date<-as.numeric(dt1$Date)
modelSlope <- loess(Measure~Date, data= dt1,span=0.4)
xrangeSlope <- range(dt1$Date)
xseqSlope <- seq(from=xrangeSlope[1], to=xrangeSlope[2], length=100000)
predSlope <- predict(modelSlope, newdata = data.frame(Date = xseqSlope), se=TRUE)
ySlope = predSlope$fit
gam.DFslope <- data.frame(x = xseqSlope, ySlope)
gam.DFslope$x<-as.POSIXct(gam.DFslope$x,origin = "1970-01-01", tz = "GMT")
dt1$Date<-as.POSIXct(dt1$Date,origin = "1970-01-01", tz = "GMT")
p3<-ggplot()+
geom_point(data=gam.DFslope, aes(x = x, y = ySlope),color="green")+
geom_point(data=dt1, aes(x = Date, y = Measure),color="black")
p3
Plot of manually created smooth model
但是我想要相同的,但是点之间的线性模型的值(如您所见,黄土模型不太适合)。黄土模型也存在错误,似乎不适用于某些不同的数据集(样本太小?)
有什么建议吗?有没有办法可以使用 geom_line 的 ggplot_build(p1)?感谢您提供的任何帮助!
您正在寻找已知点之间的线性插值。 R 有内置函数 approx()
.
p0<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point() +geom_line()+
labs(x="Date",y="Values")+
scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p0
#linear interpolate 100 points between min(x) and max(x)
# use the 'xout' option to specify the locations of interpolation.
linearinter<-as.data.frame(approx(dt1$Date, dt1$Measure, n=100))
linearinter$x <- as.POSIXct(linearinter$x, origin = "1970-01-01", tz="GMT")
head(linearinter)
> x y
>1 2019-02-13 11:11:00 728.2136
>2 2019-02-19 02:41:54 728.6076
>3 2019-02-24 18:12:49 729.0015
>4 2019-03-02 09:43:43 729.3955
>5 2019-03-08 01:14:38 729.7894
>6 2019-03-13 16:45:32 730.1834
p0 + geom_line(aes(x, y), data=linearinter, col="red")
还有 spline()
函数可以为插值添加一些曲率。
我有一个我无法解决的任务:我有一组不同日期的值,并且希望通过在点之间创建线性模型并提取拟合来获得这些日期之间的值。这将很有用,因为我有另一个数据集需要根据日期分配此值。合适后,将使用滚动连接分配这些(这部分已经可以正常工作)。
这是一个例子和我尝试过的:
dt1 <- read.table(text ="Date,Measure
2019-02-13 11:11:00,728.2172
2019-07-09 11:11:00,738.4000
2019-08-06 11:11:00,743.8530
2019-02-13 11:11:00,728.2100
2019-07-09 11:11:00,738.4000
2019-08-06 11:11:00,743.8500
2019-12-11 11:11:00,696.4650
2020-03-02 11:11:00,715.5200
2020-04-30 11:11:00,721.1650
2020-08-25 11:11:00,740.0000", header = T, sep=",")
str(dt1)
dt1$Date<-as.POSIXct(dt1$Date,origin = "1970-01-01", tz = "GMT")
p0<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point() +geom_line()+
labs(x="Date",y="Values")+
scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p0
plot of data sample
我找到的最接近的答案是:Method to extract stat_smooth line fit
根据该建议,第一种方法(使用 ggplot_build(p1)):
p1<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point()+
geom_smooth(method = "loess", span=0.4)+
labs(x="Date",y="Values")+
scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p1
ggplot_build(p1)
fitdt1<- ggplot_build(p1)$data[[2]]
fitdt1$x<-as.POSIXct(fitdt1$x,origin = "1970-01-01", tz = "GMT")
p2<-ggplot(data=fitdt1, aes(x = x, y = y))+
geom_point()
p2
...只给了80分,不够精确:
Plot of ggplot_build fit
因此我能够手动创建模型并决定产生多少点(即使某些数据集有一些警告错误):
dt1$Date<-as.numeric(dt1$Date)
modelSlope <- loess(Measure~Date, data= dt1,span=0.4)
xrangeSlope <- range(dt1$Date)
xseqSlope <- seq(from=xrangeSlope[1], to=xrangeSlope[2], length=100000)
predSlope <- predict(modelSlope, newdata = data.frame(Date = xseqSlope), se=TRUE)
ySlope = predSlope$fit
gam.DFslope <- data.frame(x = xseqSlope, ySlope)
gam.DFslope$x<-as.POSIXct(gam.DFslope$x,origin = "1970-01-01", tz = "GMT")
dt1$Date<-as.POSIXct(dt1$Date,origin = "1970-01-01", tz = "GMT")
p3<-ggplot()+
geom_point(data=gam.DFslope, aes(x = x, y = ySlope),color="green")+
geom_point(data=dt1, aes(x = Date, y = Measure),color="black")
p3
Plot of manually created smooth model
但是我想要相同的,但是点之间的线性模型的值(如您所见,黄土模型不太适合)。黄土模型也存在错误,似乎不适用于某些不同的数据集(样本太小?)
有什么建议吗?有没有办法可以使用 geom_line 的 ggplot_build(p1)?感谢您提供的任何帮助!
您正在寻找已知点之间的线性插值。 R 有内置函数 approx()
.
p0<-ggplot(data=dt1, aes(x = Date, y = Measure))+ geom_point() +geom_line()+
labs(x="Date",y="Values")+
scale_x_datetime(date_breaks = "3 month", date_labels = "%b %y")
p0
#linear interpolate 100 points between min(x) and max(x)
# use the 'xout' option to specify the locations of interpolation.
linearinter<-as.data.frame(approx(dt1$Date, dt1$Measure, n=100))
linearinter$x <- as.POSIXct(linearinter$x, origin = "1970-01-01", tz="GMT")
head(linearinter)
> x y
>1 2019-02-13 11:11:00 728.2136
>2 2019-02-19 02:41:54 728.6076
>3 2019-02-24 18:12:49 729.0015
>4 2019-03-02 09:43:43 729.3955
>5 2019-03-08 01:14:38 729.7894
>6 2019-03-13 16:45:32 730.1834
p0 + geom_line(aes(x, y), data=linearinter, col="red")
还有 spline()
函数可以为插值添加一些曲率。