Plotly:如何用观测值和回归线之间的线显示回归误差?

Plotly: How to display regression errors with lines between the observations and the regression line?

我在 Python 中生成了以下 Plotly 图表。我已经将回归调整为一组有限的点,并得到以下图表:

我想在这些点和调整后的曲线之间画一条垂直线,如下例所示:

我正在使用 Plotly plotly.graph_objects , pandas 来生成这些图表,但我不知道如何绘制它们。这是我正在使用的代码:

import pandas as pd
import plotly.graph_objects as go

for point, curve in zip(points, curves):

    point_plot = go.Scatter(x=df['Duration'],
                            y=df[point],
                            name=point,
                            # text=df['Nome'],
                            mode='markers+text',
                            line_color=COLOR_CODE[point],
                            textposition='top center')

    line_plot = go.Scatter(x=df['Duration'],
                            y=df[curve],
                            name='', 
                            line_color=COLOR_CODE[point],
                            mode='lines')
    

    # XXX: this don't solve the problem but it's what I could think of for now
    to_bar = df[points].diff(axis=1).copy()
    to_bar['Nome'] = df['Nome']
    bar_plot = go.Bar(x=to_bar['Nome'], y=to_bar[point], name='', marker_color=COLOR_CODE[point])

                            
    fig.add_trace(line_plot, row=1, col=1)
    fig.add_trace(point_plot, row=1, col=1)
    fig.add_trace(bar_plot, row=2, col=1)

您没有提供包含数据样本的工作代码片段,因此我将根据我之前的回答 提出建议。如果你的数字像你的例子一样分为两个系列,你可以:

1. 使用 xVals = fig.data[0]['x']

从其中一个系列中检索 x 值

2. 使用字典 errors = {}

组织回归线和观察标记的所有点

3. 使用以下命令填充该字典:

for d in fig.data:
    errors[d['mode']]=d['y']

4. 然后您可以使用以下方法为您的线和标记(您的错误)之间的距离添加线形:

for i, x in enumerate(xVals):
    shapes.append(go.layout.Shape(type="line", [...])

结果:

完整代码:

import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()

df = pd.DataFrame({'X': X, 'Y':Y})

# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))


# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

# retrieve x-values from one of the series
xVals = fig.data[0]['x']

errors = {} # container for prediction errors

# organize data for errors in a dict
for d in fig.data:
    errors[d['mode']]=d['y']

shapes = [] # container for shapes

# make a line shape for each error == distance between each marker and line points
for i, x in enumerate(xVals):
    shapes.append(go.layout.Shape(type="line",
                                    x0=x,
                                    y0=errors['markers'][i],
                                    x1=x,
                                    y1=errors['lines'][i],
                                    line=dict(
                                        #color=np.random.choice(colors,1)[0],
                                        color = 'black',
                                        width=1),
                                    opacity=0.5,
                                    layer="above")
                 )

# include shapes in layout
fig.update_layout(shapes=shapes)
fig.show()