如何向任意函数创建的 seaborn 散点图添加一条线
How to add a line to a seaborn scatterplot created by an arbitrary function
我希望能够在 python 中创建以下情节(取自 https://en.wikipedia.org/wiki/Logistic_regression#Logistic_model)
数据为:
hours = [
0.50,
0.75,
1.00,
1.25,
1.50,
1.75,
1.75,
2.00,
2.25,
2.50,
2.75,
3.00,
3.25,
3.50,
4.00,
4.25,
4.50,
4.75,
5.00,
5.50,
]
passed = [0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1]
df = pd.DataFrame({"hours_study": hours, "passed": passed})
使用以下内容可以轻松创建散点图:
sns.scatterplot(df.hours_study, df.passed)
给予
但我不确定如何向图中添加线条(在本例中为逻辑曲线)。
Matplotlib 的绘图可以在任何现有绘图上绘制曲线。要绘制逻辑函数,只需绘制 1 / (1 + exp(-beta0 - beta1 * x))
,其中 beta0 和 beta1 是将逻辑函数拟合给定数据的结果。 Scikit Learn的LogisticRegression
是一个函数,可以拟合这样一个函数和return参数:
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import pandas as pd
import numpy as np
def draw_logistic_regression_curve(beta0, beta1, x, **kwargs):
y = 1 / (1 + np.exp(-beta0 - beta1 * x))
plt.plot(x, y, '-', **kwargs)
hours = np.array([0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75,
3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50])
passed = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
df = pd.DataFrame({"hours_study": hours, "passed": passed})
sns.scatterplot(df.hours_study, df.passed)
clf = LogisticRegression().fit(hours.reshape(-1, 1), passed)
beta0 = clf.intercept_ # -3.13952411
beta1 = clf.coef_[0] # 1.14860386
x = np.linspace(min(hours) - 0.5, max(hours) + 0.5, 500)
draw_logistic_regression_curve(beta0, beta1, x, color='crimson', label="Sklearn's default estimate")
draw_logistic_regression_curve(-4.0777, 1.5046, x, color='limegreen', label="Wikipedia's estimate")
plt.legend(loc='center right')
plt.show()
我希望能够在 python 中创建以下情节(取自 https://en.wikipedia.org/wiki/Logistic_regression#Logistic_model)
数据为:
hours = [
0.50,
0.75,
1.00,
1.25,
1.50,
1.75,
1.75,
2.00,
2.25,
2.50,
2.75,
3.00,
3.25,
3.50,
4.00,
4.25,
4.50,
4.75,
5.00,
5.50,
]
passed = [0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1]
df = pd.DataFrame({"hours_study": hours, "passed": passed})
使用以下内容可以轻松创建散点图:
sns.scatterplot(df.hours_study, df.passed)
给予
但我不确定如何向图中添加线条(在本例中为逻辑曲线)。
Matplotlib 的绘图可以在任何现有绘图上绘制曲线。要绘制逻辑函数,只需绘制 1 / (1 + exp(-beta0 - beta1 * x))
,其中 beta0 和 beta1 是将逻辑函数拟合给定数据的结果。 Scikit Learn的LogisticRegression
是一个函数,可以拟合这样一个函数和return参数:
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
import seaborn as sns
import pandas as pd
import numpy as np
def draw_logistic_regression_curve(beta0, beta1, x, **kwargs):
y = 1 / (1 + np.exp(-beta0 - beta1 * x))
plt.plot(x, y, '-', **kwargs)
hours = np.array([0.50, 0.75, 1.00, 1.25, 1.50, 1.75, 1.75, 2.00, 2.25, 2.50, 2.75,
3.00, 3.25, 3.50, 4.00, 4.25, 4.50, 4.75, 5.00, 5.50])
passed = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])
df = pd.DataFrame({"hours_study": hours, "passed": passed})
sns.scatterplot(df.hours_study, df.passed)
clf = LogisticRegression().fit(hours.reshape(-1, 1), passed)
beta0 = clf.intercept_ # -3.13952411
beta1 = clf.coef_[0] # 1.14860386
x = np.linspace(min(hours) - 0.5, max(hours) + 0.5, 500)
draw_logistic_regression_curve(beta0, beta1, x, color='crimson', label="Sklearn's default estimate")
draw_logistic_regression_curve(-4.0777, 1.5046, x, color='limegreen', label="Wikipedia's estimate")
plt.legend(loc='center right')
plt.show()