查找套索模型中使用的特征
Finding the features used in a lasso model
我正在使用 sklearn 的糖尿病数据集。
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
from sklearn.linear_model import Lasso
lasso = Lasso().fit(X_train, y_train)
import numpy as np
np.sum(lasso.coef_ != 0)
我拆分了数据集,然后使用训练数据集训练了我的 Lasso 模型。我最后的打印声明 returns 模型使用了多少特征。我如何在 sklearn/ Python?
中定义这些特征的名称
您可以使用 diabetes['feature_names']
获取糖尿病数据集的特征名称。之后,您可以提取所选特征的名称(即估计系数不为零的特征),如下所示:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
lasso = Lasso().fit(X_train, y_train)
names = diabetes['feature_names']
print(names)
# ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
print(np.sum(lasso.coef_ != 0))
# 2
print([names[i] for i in range(len(names)) if lasso.coef_[i] != 0])
# ['bmi', 's5']
您可以使用:
lasso.feature_names_in_
这是一个相当新的属性,因此请检查您的 sklearn 库是否已更新。
你可以这样做:
import sklearn
sklearn.__version__
我正在使用 sklearn 的糖尿病数据集。
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
from sklearn.linear_model import Lasso
lasso = Lasso().fit(X_train, y_train)
import numpy as np
np.sum(lasso.coef_ != 0)
我拆分了数据集,然后使用训练数据集训练了我的 Lasso 模型。我最后的打印声明 returns 模型使用了多少特征。我如何在 sklearn/ Python?
中定义这些特征的名称您可以使用 diabetes['feature_names']
获取糖尿病数据集的特征名称。之后,您可以提取所选特征的名称(即估计系数不为零的特征),如下所示:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
diabetes = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(diabetes['data'], diabetes['target'], random_state=263)
lasso = Lasso().fit(X_train, y_train)
names = diabetes['feature_names']
print(names)
# ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
print(np.sum(lasso.coef_ != 0))
# 2
print([names[i] for i in range(len(names)) if lasso.coef_[i] != 0])
# ['bmi', 's5']
您可以使用:
lasso.feature_names_in_
这是一个相当新的属性,因此请检查您的 sklearn 库是否已更新。 你可以这样做:
import sklearn
sklearn.__version__