有没有办法适应两个域的功能

Question

我有用电量数据（以图形和直观的方式）显示，如果温度超过某个值，用电量开始几乎呈线性增加。这听起来合乎逻辑，因为空调系统启动运行；外部温度越高意味着系统运行时间越长，耗电量越大。

有没有办法在两个域上拟合数据集，例如

T < T0. poweruse = C
T > T0. poweruse = A * temp + B

并拟合所有参数 A、B、C 和 T0？

更喜欢 R 中的解决方案，但很乐意采用 Python 或 JMP。

Answer 1

如果你知道 T0 那么线性模型就可以了：

lm(T ~ 1 + I(T>T0) + I(T>T0):temp, ...)

如果 T<T0，I(T>T0) 的计算结果为 0，否则为 1。

segmented包将在断点上方和下方拟合单独的线性模型，和将为您估计断点，但我认为没有办法将线性模型约束为断点以下的常量（斜率=0）和以上的非常量...作为一个稍微复杂的解决方案，您可以使用 segmented 来估计断点，然后重新适应 lm() 上面的方法（你的断点可能不是 optimal/you 可能会稍微低估你估计中的不确定性，因为双阶段方法第一阶段不施加“最初平坦”的约束和第二阶段假定断点是确切已知的）。

就其价值而言，添加“我需要估计断点”部分会使事情变得有点棘手，因为拟合优度曲面在观察到的 T 值处变得不连续 ...

This answer 可能可以简化为做你想做的事...

Answer 2

如果 poweruse 是 temp 的连续函数，它从较低温度的水平段开始，然后在较高温度下过渡到增加的线性段，那么我们可以将其表示为最大值每个温度下的两条腿。

给定上升温度，我们可以假设至少前 k 个在水平腿中，至少最后 k 个在线性腿中。我们分别使用 mean 和 lm 分别拟合两条腿的这些部分以获得起始值，然后使用这些值使用 nls.

获得整体拟合

请注意，T0 位于两条腿的交点处，因此 T0 是方程 C = A + T0 * B 的解，即 T0 = (C-A)/B 。

没有使用包。

# test data
set.seed(123)
temp <- 1:100
poweruse <- pmax(50, 1 + 2 * temp) + rnorm(100)

# starting values
k <- 10
fm0 <- lm(poweruse ~ temp, subset = seq(to = length(temp), length = k))
st <- list(C = mean(head(poweruse, k)), A = coef(fm0)[[1]], B = coef(fm0)[[2]])

fm <- nls(poweruse ~ pmax(C, A + B * temp), start = st)
fm
## Nonlinear regression model
##   model: poweruse ~ pmax(C, A + B * temp)
##    data: parent.frame()
##       C       A       B 
## 49.9913  0.8876  2.0037 
##  residual sum-of-squares: 81.67
##
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 2.874e-07

# calculate T0
with(as.list(coef(fm)), (C - A)/B)
## [1] 24.50596

plot(poweruse ~ temp, type = "l")
lines(fitted(fm) ~ temp, col = "red")

有没有办法适应两个域的功能

Is there a way to fit a function of two domains

r

linear-regression