使用 scikit-learn 为 NER 训练 NLP 对数线性模型
Using scikit-learn to training an NLP log linear model for NER
我想知道如何使用 sklearn.linear_model.LogisticRegression
训练用于命名实体识别 (NER) 的 NLP 对数线性模型。
一个典型的对数线性模型定义条件概率如下:
与:
- x: 当前单词
- y:正在考虑的单词的 class
- f:特征向量函数,它将单词 x 和 class y 映射到标量向量。
- v:特征权重向量
可以sklearn.linear_model.LogisticRegression
训练这样的模型吗?
问题是功能依赖于 class。
在 scikit-learn 0.16 及更高版本中,您可以使用 sklearn.linear_model.LogisticRegression
的 multinomial
选项来训练对数线性模型(a.k.a。MaxEnt 分类器,多类逻辑回归) .目前,multinomial
选项被‘lbfgs’和‘newton-cg’求解器 supported only。
鸢尾花数据集示例(4 个特征,3 个 类,150 个样本):
#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import print_function
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
# Import data
iris = datasets.load_iris()
X = iris.data # features
y_true = iris.target # labels
# Look at the size of the feature matrix and the label vector:
print('iris.data.shape: {0}'.format(iris.data.shape))
print('iris.target.shape: {0}\n'.format(iris.target.shape))
# Instantiate a MaxEnt model
logreg = linear_model.LogisticRegression(C=1e5, multi_class='multinomial', solver='lbfgs')
# Train the model
logreg.fit(X, y_true)
print('logreg.coef_: \n{0}\n'.format(logreg.coef_))
print('logreg.intercept_: \n{0}'.format(logreg.intercept_))
# Use the model to make predictions
y_pred = logreg.predict(X)
print('\ny_pred: \n{0}'.format(y_pred))
# Assess the quality of the predictions
print('\nconfusion_matrix(y_true, y_pred):\n{0}\n'.format(confusion_matrix(y_true, y_pred)))
print('classification_report(y_true, y_pred): \n{0}'.format(classification_report(y_true, y_pred)))
sklearn.linear_model.LogisticRegression
was introduced in version 0.16 的 multinomial
选项:
- Add
multi_class="multinomial"
option in
:class:linear_model.LogisticRegression
to implement a Logistic
Regression solver that minimizes the cross-entropy or multinomial loss
instead of the default One-vs-Rest setting. Supports lbfgs
and
newton-cg
solvers. By Lars Buitinck
_ and Manoj Kumar
_. Solver option
newton-cg
by Simon Wu.
我想知道如何使用 sklearn.linear_model.LogisticRegression
训练用于命名实体识别 (NER) 的 NLP 对数线性模型。
一个典型的对数线性模型定义条件概率如下:
与:
- x: 当前单词
- y:正在考虑的单词的 class
- f:特征向量函数,它将单词 x 和 class y 映射到标量向量。
- v:特征权重向量
可以sklearn.linear_model.LogisticRegression
训练这样的模型吗?
问题是功能依赖于 class。
在 scikit-learn 0.16 及更高版本中,您可以使用 sklearn.linear_model.LogisticRegression
的 multinomial
选项来训练对数线性模型(a.k.a。MaxEnt 分类器,多类逻辑回归) .目前,multinomial
选项被‘lbfgs’和‘newton-cg’求解器 supported only。
鸢尾花数据集示例(4 个特征,3 个 类,150 个样本):
#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import print_function
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
# Import data
iris = datasets.load_iris()
X = iris.data # features
y_true = iris.target # labels
# Look at the size of the feature matrix and the label vector:
print('iris.data.shape: {0}'.format(iris.data.shape))
print('iris.target.shape: {0}\n'.format(iris.target.shape))
# Instantiate a MaxEnt model
logreg = linear_model.LogisticRegression(C=1e5, multi_class='multinomial', solver='lbfgs')
# Train the model
logreg.fit(X, y_true)
print('logreg.coef_: \n{0}\n'.format(logreg.coef_))
print('logreg.intercept_: \n{0}'.format(logreg.intercept_))
# Use the model to make predictions
y_pred = logreg.predict(X)
print('\ny_pred: \n{0}'.format(y_pred))
# Assess the quality of the predictions
print('\nconfusion_matrix(y_true, y_pred):\n{0}\n'.format(confusion_matrix(y_true, y_pred)))
print('classification_report(y_true, y_pred): \n{0}'.format(classification_report(y_true, y_pred)))
sklearn.linear_model.LogisticRegression
was introduced in version 0.16 的 multinomial
选项:
- Add
multi_class="multinomial"
option in :class:linear_model.LogisticRegression
to implement a Logistic Regression solver that minimizes the cross-entropy or multinomial loss instead of the default One-vs-Rest setting. Supportslbfgs
andnewton-cg
solvers. ByLars Buitinck
_ andManoj Kumar
_. Solver optionnewton-cg
by Simon Wu.