如何 return knngow 中最近邻的索引
how to return index of nearest neighbor in knngow
我想在 dprep 包中使用 knngow。而且,除了 return 测试数据的适当标签外,我还想 return 行索引到最近的邻居(在火车数据中)。这个包里有这个job的功能吗?我的数据如下
df1<-data.frame(c("a","b","c"),c(1,2,3),c("T","F","T"))
df2<-data.frame(c("a","d","f"),c(4,1,3),c("F","F","T"))
mylist1<-list()
mylist1[[1]]<-df1
mylist1[[2]]<-df2
tst1<-data.frame(c("f"),c(2))
library(dprep)
for(i in 1:length(mylist1)){
knn_model<-knngow(mylist1[[i]],tst1,1)}
我想,除了returning标签,比如显示最近邻在mylist的第3行[[2]]
已根据您的评论更新
我没有看到任何 returns 火车数据中关于 dprep 包的最近邻居的索引的函数(希望我没有遗漏任何东西)。
但是,您可以先使用高尔距离计算距离矩阵(FD package) and then pass this matrix to a k-nearest-neighbors function (the KernelKnn 包接受距离矩阵作为输入)。如果您决定使用 KernelKnn 包,那么首先使用 devtools::install_github('mlampros/KernelKnn').
安装最新版本
# train-data [ "col3" is the response variable, 'stringsAsFactors' by default ]
df1 <- data.frame(col1 = c("a","d","f"), col2 = c(1,3,2), col3 = c("T","F","T"), stringsAsFactors = T)
# test-data
tst1 <- data.frame(col1 = c("f"), col2 = c(2), stringsAsFactors = T)
# rbind train and test data (remove the response variable from df1)
df_all = rbind(df1[, -3], tst1)
# calculate distance matrix
dist_gower = as.matrix(FD::gowdis(df_all))
# use the dist_gower distance matrix as input to the 'distMat.knn.index.dist' function
# additionaly specify which row-index is the test-data observation from the previously 'df_all' data.frame using the 'TEST_indices' parameter
idxs = KernelKnn::distMat.knn.index.dist(dist_gower, TEST_indices = c(4), k = 2, threads = 1, minimize = T)
idxs$test_knn_idx returns训练数据中测试数据观察的k-nearest-neighbors
print(idxs)
$test_knn_idx
[,1] [,2]
[1,] 3 1
$test_knn_dist
[,1] [,2]
[1,] 0 0.75
如果您还想要 class 标签的概率,则首先转换为数字,然后使用 distMat.KernelKnn 函数
y_numeric = as.numeric(df1$col3)
labels = KernelKnn::distMat.KernelKnn(dist_gower, TEST_indices = c(4), y = y_numeric, k = 2, regression = F, threads = 1, Levels = sort(unique(y_numeric)), minimize = T)
print(labels)
class_1 class_2
[1,] 0 1
# class_2 corresponds to "T" from col3 (df1 data.frame)
或者,您可以查看 dprep::knngow,尤其是您感兴趣的函数的第二部分,
> print(dprep::knngow)
....
else {
for (i in 1:ntest) {
tempo = order(StatMatch::gower.dist(test[i, -p], train[, -p]))[1:k]
classes[i] = moda(train[tempo, p])[1]
}
}
.....
我想在 dprep 包中使用 knngow。而且,除了 return 测试数据的适当标签外,我还想 return 行索引到最近的邻居(在火车数据中)。这个包里有这个job的功能吗?我的数据如下
df1<-data.frame(c("a","b","c"),c(1,2,3),c("T","F","T"))
df2<-data.frame(c("a","d","f"),c(4,1,3),c("F","F","T"))
mylist1<-list()
mylist1[[1]]<-df1
mylist1[[2]]<-df2
tst1<-data.frame(c("f"),c(2))
library(dprep)
for(i in 1:length(mylist1)){
knn_model<-knngow(mylist1[[i]],tst1,1)}
我想,除了returning标签,比如显示最近邻在mylist的第3行[[2]]
已根据您的评论更新
我没有看到任何 returns 火车数据中关于 dprep 包的最近邻居的索引的函数(希望我没有遗漏任何东西)。 但是,您可以先使用高尔距离计算距离矩阵(FD package) and then pass this matrix to a k-nearest-neighbors function (the KernelKnn 包接受距离矩阵作为输入)。如果您决定使用 KernelKnn 包,那么首先使用 devtools::install_github('mlampros/KernelKnn').
安装最新版本# train-data [ "col3" is the response variable, 'stringsAsFactors' by default ]
df1 <- data.frame(col1 = c("a","d","f"), col2 = c(1,3,2), col3 = c("T","F","T"), stringsAsFactors = T)
# test-data
tst1 <- data.frame(col1 = c("f"), col2 = c(2), stringsAsFactors = T)
# rbind train and test data (remove the response variable from df1)
df_all = rbind(df1[, -3], tst1)
# calculate distance matrix
dist_gower = as.matrix(FD::gowdis(df_all))
# use the dist_gower distance matrix as input to the 'distMat.knn.index.dist' function
# additionaly specify which row-index is the test-data observation from the previously 'df_all' data.frame using the 'TEST_indices' parameter
idxs = KernelKnn::distMat.knn.index.dist(dist_gower, TEST_indices = c(4), k = 2, threads = 1, minimize = T)
idxs$test_knn_idx returns训练数据中测试数据观察的k-nearest-neighbors
print(idxs)
$test_knn_idx
[,1] [,2]
[1,] 3 1
$test_knn_dist
[,1] [,2]
[1,] 0 0.75
如果您还想要 class 标签的概率,则首先转换为数字,然后使用 distMat.KernelKnn 函数
y_numeric = as.numeric(df1$col3)
labels = KernelKnn::distMat.KernelKnn(dist_gower, TEST_indices = c(4), y = y_numeric, k = 2, regression = F, threads = 1, Levels = sort(unique(y_numeric)), minimize = T)
print(labels)
class_1 class_2
[1,] 0 1
# class_2 corresponds to "T" from col3 (df1 data.frame)
或者,您可以查看 dprep::knngow,尤其是您感兴趣的函数的第二部分,
> print(dprep::knngow)
....
else {
for (i in 1:ntest) {
tempo = order(StatMatch::gower.dist(test[i, -p], train[, -p]))[1:k]
classes[i] = moda(train[tempo, p])[1]
}
}
.....