Plotly:如何使用 plotly 和 plotly express 绘制回归线?
Plotly: How to plot a regression line using plotly and plotly express?
我有一个数据框 df,列为 pm1 和 pm25。我想展示这两个信号的相关程度的图表(使用 Plotly)。到目前为止,我已经设法显示了散点图,但我没有设法绘制信号之间相关性的拟合线。到目前为止,我试过这个:
denominator=df.pm1**2-df.pm1.mean()*df.pm1.sum()
print('denominator',denominator)
m=(df.pm1.dot(df.pm25)-df.pm25.mean()*df.pm1.sum())/denominator
b=(df.pm25.mean()*df.pm1.dot(df.pm1)-df.pm1.mean()*df.pm1.dot(df.pm25))/denominator
y_pred=m*df.pm1+b
lineOfBestFit = go.Scattergl(
x=df.pm1,
y=y_pred,
name='Line of best fit',
line=dict(
color='red',
)
)
data = [dataPoints, lineOfBestFit]
figure = go.Figure(data=data)
figure.show()
剧情:
如何正确绘制 lineOfBestFit?
更新 1:
现在 plotly express 可以轻而易举地处理 long and wide format(在你的情况下是后者)的数据,你唯一需要绘制回归线的是:
fig = px.scatter(df, x='X', y='Y', trendline="ols")
问题末尾宽数据的完整代码片段
如果您希望回归线突出,您可以在以下位置指定 trendline_color_override
:
fig = `px.scatter([...], trendline_color_override = 'red')
或者在通过以下方式构建图形后编辑线条颜色:
fig.data[1].line.color = 'red'
您可以访问回归参数,例如 alpha
和 beta through
:
model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]
您甚至可以通过以下方式请求非线性拟合:
fig = px.scatter(df, x='X', y='Y', trendline="lowess")
那些长格式呢?这就是 plotly express 揭示其真正力量的地方。如果以内置数据集 px.data.gapminder
为例,您可以通过指定 color="continent"
:
来触发一组国家/地区的单独行
长格式的完整代码段
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
fig.show()
如果您想在模型选择和输出方面获得更大的灵活性,您可以随时求助于我对下面这个 post 的原始回答。但首先,这是我回答开头的那些示例的完整片段:
宽数据的完整代码段
import plotly.graph_objects as go
import plotly.express as px
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# figure with regression
# fig = px.scatter(df, x='X', y='Y', trendline="ols")
fig = px.scatter(df, x='X', y='Y', trendline="lowess")
# make the regression line stand out
fig.data[1].line.color = 'red'
# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
fig.show()
原回答:
对于回归分析,我喜欢使用 statsmodels.api
或 sklearn.linear_model
。我还喜欢在 pandas 数据框中组织数据和回归结果。以下是一种以干净、有条理的方式完成您正在寻找的事情的方法:
使用 sklearn 或 statsmodels 绘图:
使用sklearn的代码:
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# regression
reg = LinearRegression().fit(np.vstack(df['X']), Y)
df['bestfit'] = reg.predict(np.vstack(df['X']))
# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))
# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
fig.show()
使用统计模型的代码:
import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues
# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))
# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
fig.show()
Plotly 还带有用于绘制(非)线性线的 statsmodels 的本机包装器:
引用他们的文档:https://plotly.com/python/linear-fits/
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()
我有一个数据框 df,列为 pm1 和 pm25。我想展示这两个信号的相关程度的图表(使用 Plotly)。到目前为止,我已经设法显示了散点图,但我没有设法绘制信号之间相关性的拟合线。到目前为止,我试过这个:
denominator=df.pm1**2-df.pm1.mean()*df.pm1.sum()
print('denominator',denominator)
m=(df.pm1.dot(df.pm25)-df.pm25.mean()*df.pm1.sum())/denominator
b=(df.pm25.mean()*df.pm1.dot(df.pm1)-df.pm1.mean()*df.pm1.dot(df.pm25))/denominator
y_pred=m*df.pm1+b
lineOfBestFit = go.Scattergl(
x=df.pm1,
y=y_pred,
name='Line of best fit',
line=dict(
color='red',
)
)
data = [dataPoints, lineOfBestFit]
figure = go.Figure(data=data)
figure.show()
剧情:
如何正确绘制 lineOfBestFit?
更新 1:
现在 plotly express 可以轻而易举地处理 long and wide format(在你的情况下是后者)的数据,你唯一需要绘制回归线的是:
fig = px.scatter(df, x='X', y='Y', trendline="ols")
问题末尾宽数据的完整代码片段
如果您希望回归线突出,您可以在以下位置指定 trendline_color_override
:
fig = `px.scatter([...], trendline_color_override = 'red')
或者在通过以下方式构建图形后编辑线条颜色:
fig.data[1].line.color = 'red'
您可以访问回归参数,例如 alpha
和 beta through
:
model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]
您甚至可以通过以下方式请求非线性拟合:
fig = px.scatter(df, x='X', y='Y', trendline="lowess")
那些长格式呢?这就是 plotly express 揭示其真正力量的地方。如果以内置数据集 px.data.gapminder
为例,您可以通过指定 color="continent"
:
长格式的完整代码段
import plotly.express as px
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
fig.show()
如果您想在模型选择和输出方面获得更大的灵活性,您可以随时求助于我对下面这个 post 的原始回答。但首先,这是我回答开头的那些示例的完整片段:
宽数据的完整代码段
import plotly.graph_objects as go
import plotly.express as px
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# figure with regression
# fig = px.scatter(df, x='X', y='Y', trendline="ols")
fig = px.scatter(df, x='X', y='Y', trendline="lowess")
# make the regression line stand out
fig.data[1].line.color = 'red'
# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
fig.show()
原回答:
对于回归分析,我喜欢使用 statsmodels.api
或 sklearn.linear_model
。我还喜欢在 pandas 数据框中组织数据和回归结果。以下是一种以干净、有条理的方式完成您正在寻找的事情的方法:
使用 sklearn 或 statsmodels 绘图:
使用sklearn的代码:
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# regression
reg = LinearRegression().fit(np.vstack(df['X']), Y)
df['bestfit'] = reg.predict(np.vstack(df['X']))
# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))
# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
fig.show()
使用统计模型的代码:
import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime
# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})
# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues
# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))
# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
fig.show()
Plotly 还带有用于绘制(非)线性线的 statsmodels 的本机包装器:
引用他们的文档:https://plotly.com/python/linear-fits/
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()