OneR WEKA - 预测错误？

Question

我正在尝试通过在 WEKA 中迭代使用 OneR，根据属性的预测能力对属性进行排名。在每运行次，我删除所选属性以查看下一个最佳属性。

我对我的所有属性都这样做了，一些（十个属性中的三个）比其他属性高 'ranked'，尽管它们的正确预测百分比较低，ROC 区域平均值较小且它们的规则较少紧凑。

据我所知，OneR 只是查看其拥有的属性的频率表，然后查看 class 值，因此它不会关心我是否取出属性......但我我可能遗漏了什么

有人有想法吗？

Answer 1

OneR 分类器看起来有点像最近邻。鉴于此，以下内容适用：在 source code of the OneR classifier 中，它表示：

    // if this attribute is the best so far, replace the rule
    if (noRule || r.m_correct > m_rule.m_correct) {
      m_rule = r;
    }

因此，应该有可能（在 1-R generally 或此实现中）一个属性阻止另一个属性，但稍后会在您的过程中删除。

假设您的属性 1、2 和 3 的分布为 1：50%，2：30%，3：20%。在属性 1 最好的所有情况下，属性 3 次之。

因此，当属性 1 被排除在外时，属性 3 以 70% 获胜，即使属性 2 在所有三个比较中排名 "better" 而不是 3。

Answer 2

作为替代方案，您可以使用 OneR 包（在 CRAN 上可用，更多信息请点击此处：OneR - Establishing a New Baseline for Machine Learning Classification Models）

使用选项 verbose = TRUE 您可以获得所有属性的准确性，例如：

> library(OneR)
> example(OneR)

OneR> data <- optbin(iris)

OneR> model <- OneR(data, verbose = TRUE)

    Attribute    Accuracy
1 * Petal.Width  96%     
2   Petal.Length 95.33%  
3   Sepal.Length 74.67%  
4   Sepal.Width  55.33%  
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'


OneR> summary(model)

Rules:
If Petal.Width = (0.0976,0.791] then Species = setosa
If Petal.Width = (0.791,1.63]   then Species = versicolor
If Petal.Width = (1.63,2.5]     then Species = virginica

Accuracy:
144 of 150 instances classified correctly (96%)

Contingency table:
            Petal.Width
Species      (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum
  setosa               * 50            0          0  50
  versicolor              0         * 48          2  50
  virginica               0            4       * 46  50
  Sum                    50           52         48 150
---
Maximum in each column: '*'

Pearson's Chi-squared test:
X-squared = 266.35, df = 4, p-value < 2.2e-16

（完全披露：我是这个包的作者，我会对你得到的结果非常感兴趣）

OneR WEKA - 预测错误？

OneR WEKA - wrong prediction?

classification

weka