scikit 中的 OCSVM:异常值的距离始终为负
OCSVM in scikit: distance of outlier is always negative
我正在使用 Scikit 中的 class SVM classifier OneClassSVM
来确定数据集中的异常值。我的数据集有 30000 个样本和 1024 个变量。我使用其中的 10% 作为训练数据。
clf=svm.OneClassSVM(nu=0.001,kernel="rbf",gamma=1e-5)
clf.fit(trset)
dist2hptr=clf.decision_function(trset)
tr_y=clf.predict(trset)
如上,我使用decision_function(x)
函数计算每个样本到决策函数的距离。当我比较预测结果和距离结果时,它总是在预测输出中显示标记为 +1 的样本的正距离值和标记为 -1 的样本的负距离值。
我认为距离没有符号,因为它与方向无关。我想了解如何在 OneClassSV
scikit classifier 中计算距离。符号是否简单地表示样本位于 SVM 计算的决策超平面之外?
请帮忙。
sklearn's OneClassSVM is implemented from the following paper as explained here:
Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 13, 7 (July 2001), 1443-1471. DOI: https://doi.org/10.1162/089976601750264965
让我们看一下那篇论文的摘要here:
Suppose you are given some data set drawn from an underlying probability
distribution P and you want to estimate a “simple” subset S of input
space such that the probability that a test point drawn from P lies outside of S equals some a priori specied value between 0 and 1.
We propose a method to approach this problem by trying to estimate a
function f that is positive on S and negative on the complement.
所以摘要定义了OneClassSVM的函数f
,后面是sklearn。
我正在使用 Scikit 中的 class SVM classifier OneClassSVM
来确定数据集中的异常值。我的数据集有 30000 个样本和 1024 个变量。我使用其中的 10% 作为训练数据。
clf=svm.OneClassSVM(nu=0.001,kernel="rbf",gamma=1e-5)
clf.fit(trset)
dist2hptr=clf.decision_function(trset)
tr_y=clf.predict(trset)
如上,我使用decision_function(x)
函数计算每个样本到决策函数的距离。当我比较预测结果和距离结果时,它总是在预测输出中显示标记为 +1 的样本的正距离值和标记为 -1 的样本的负距离值。
我认为距离没有符号,因为它与方向无关。我想了解如何在 OneClassSV
scikit classifier 中计算距离。符号是否简单地表示样本位于 SVM 计算的决策超平面之外?
请帮忙。
sklearn's OneClassSVM is implemented from the following paper as explained here:
Bernhard Schölkopf, John C. Platt, John C. Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 13, 7 (July 2001), 1443-1471. DOI: https://doi.org/10.1162/089976601750264965
让我们看一下那篇论文的摘要here:
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specied value between 0 and 1.
We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement.
所以摘要定义了OneClassSVM的函数f
,后面是sklearn。