R 中的 AODE 机器学习

Question

我想知道 AODE 是否真的像描述中所说的那样比朴素贝叶斯更好：

https://cran.r-project.org/web/packages/AnDE/AnDE.pdf

--> "AODE achieves highly accurate classification by averaging over all of a small space."

https://www.quora.com/What-is-the-difference-between-a-Naive-Bayes-classifier-and-AODE

--> "AODE is a weird way of relaxing naive bayes' independence assumptions. It is no longer a generative model, but it relaxes the independence assumptions in a slightly different (and less principled) way than logistic regression does. It replaces the convex optimization problem used in training a logistic regression classifier by a quadratic (on the number of features) dependency on both training and test times."

但是我实验的时候发现预测结果好像不对，我是用这些代码实现的：

library(gmodels)
library(AnDE)
AODE_Model = aode(iris)
predict_aode = predict(AODE_Model, iris)
CrossTable(as.numeric(iris$Species), predict_aode)

有人能给我解释一下吗？或者是否有任何好的实用解决方案来实施 AODE？提前谢谢你

Answer 1

如果您查看函数的插图：

train: data.frame : training data. It should be a data frame. AODE works only discretized data. It would be better to discreetize the data frame before passing it to this function.However, aode discretizes the data if not done before hand. It uses an R package called discretization for the purpose. It uses the well known MDL discretization technique.(It might fail sometimes)

默认情况下，arules的离散化函数将其切割成3，这对于iris来说可能不够用。所以我首先重现你用arules离散化的结果：

library(arules)
library(gmodels)
library(AnDE)
set.seed(111)
trn = sample(1:nrow(indata),100)
test = setdiff(1:nrow(indata),trn)

indata <- data.frame(lapply(iris[,1:4],discretize,breaks=3),Species=iris$Species)
AODE_Model = aode(indata[trn,])
predict_aode = predict(AODE_Model, indata[test,])
CrossTable(as.numeric(indata$Species)[test], predict_aode)

                                 | predict_aode 
as.numeric(indata$Species)[test] |         1 |         3 | Row Total | 
---------------------------------|-----------|-----------|-----------|
                               1 |        15 |         5 |        20 | 
                                 |     0.500 |     4.500 |           | 
                                 |     0.750 |     0.250 |     0.400 | 
                                 |     0.333 |     1.000 |           | 
                                 |     0.300 |     0.100 |           | 
---------------------------------|-----------|-----------|-----------|
                               2 |        11 |         0 |        11 | 
                                 |     0.122 |     1.100 |           | 
                                 |     1.000 |     0.000 |     0.220 | 
                                 |     0.244 |     0.000 |           | 
                                 |     0.220 |     0.000 |           | 
---------------------------------|-----------|-----------|-----------|
                               3 |        19 |         0 |        19 | 
                                 |     0.211 |     1.900 |           | 
                                 |     1.000 |     0.000 |     0.380 | 
                                 |     0.422 |     0.000 |           | 
                                 |     0.380 |     0.000 |           | 
---------------------------------|-----------|-----------|-----------|
                    Column Total |        45 |         5 |        50 | 
                                 |     0.900 |     0.100 |           | 
---------------------------------|-----------|-----------|-----------|

您可以看到预测中缺少类之一。让我们把它增加到 4:

indata <- data.frame(lapply(iris[,1:4],discretize,breaks=4),Species=iris$Species)
AODE_Model = aode(indata[trn,])
predict_aode = predict(AODE_Model, indata[test,])
CrossTable(as.numeric(indata$Species)[test], predict_aode)

                                 | predict_aode 
as.numeric(indata$Species)[test] |         1 |         2 |         3 | Row Total | 
---------------------------------|-----------|-----------|-----------|-----------|
                               1 |        20 |         0 |         0 |        20 | 
                                 |    18.000 |     4.800 |     7.200 |           | 
                                 |     1.000 |     0.000 |     0.000 |     0.400 | 
                                 |     1.000 |     0.000 |     0.000 |           | 
                                 |     0.400 |     0.000 |     0.000 |           | 
---------------------------------|-----------|-----------|-----------|-----------|
                               2 |         0 |        10 |         1 |        11 | 
                                 |     4.400 |    20.519 |     2.213 |           | 
                                 |     0.000 |     0.909 |     0.091 |     0.220 | 
                                 |     0.000 |     0.833 |     0.056 |           | 
                                 |     0.000 |     0.200 |     0.020 |           | 
---------------------------------|-----------|-----------|-----------|-----------|
                               3 |         0 |         2 |        17 |        19 | 
                                 |     7.600 |     1.437 |    15.091 |           | 
                                 |     0.000 |     0.105 |     0.895 |     0.380 | 
                                 |     0.000 |     0.167 |     0.944 |           | 
                                 |     0.000 |     0.040 |     0.340 |           | 
---------------------------------|-----------|-----------|-----------|-----------|
                    Column Total |        20 |        12 |        18 |        50 | 
                                 |     0.400 |     0.240 |     0.360 |           | 
---------------------------------|-----------|-----------|-----------|-----------|

只错了3个。对我来说，这是在不过度拟合的情况下进行离散化的问题，这可能很棘手..

R 中的 AODE 机器学习

AODE Machine Learning in R

r

machine-learning

probability