OpenCV,通过BOWKMeansTrainer得到的词汇矩阵

OpenCV, vocabulary matrix obtained by BOWKMeansTrainer

我正在尝试按照 this 代码来实施 BoF。特别是这段代码:

//featuresUnclustered contains all the feature descriptors of all images
//Construct BOWKMeansTrainer
//the number of bags
int dictionarySize=200;
//define Term Criteria
TermCriteria tc(CV_TERMCRIT_ITER,100,0.001);
//retries number
int retries=1;
//necessary flags
int flags=KMEANS_PP_CENTERS;
//Create the BoW (or BoF) trainer
BOWKMeansTrainer bowTrainer(dictionarySize,tc,retries,flags);
//cluster the feature vectors
cout<<"starting k-means..."<<endl;
Mat dictionary=bowTrainer.cluster(featuresUnclustered);    
//store the vocabulary
FileStorage fs("dictionary.yml", FileStorage::WRITE);
fs << "vocabulary" << dictionary;
fs.release();

我获得了 dictionary.yaml 格式的文件:

%YAML:1.0
vocabulary: !!opencv-matrix
   rows: 200
   cols: 128
   dt: f
   data: [ 8.19999981e+00, 1.20000005e+00, 1., 24., 5.82000008e+01,
   ...
   ]

现在,我的问题是:每一行代表一个质​​心(我们有 200 个质心,由 dictionarySize 给出)并且由于 SIFT 的描述符大小为 128 位,因此每个质心具有相同的维度。那是对的吗?

Each row represents a centroid (and we have 200 centroids, given by dictionarySize) and since SIFT's descriptor size is 128 bit, each centroid has the same dimension. Is that correct?

是的,正确。

嗯,SIFT 有 128 个值(不是 bit)。在 OpenCV 中,每个值都是 float,即 32 位。但是,是的,每个质心有 128 个值。


k-means(dictionarySize)的K是质心的个数。每个质心都具有与您使用的特征相同的维度 N,因此 SIFT 为 128。

字典将是一个矩阵 K x N,在本例中 200 X 128


请记住,BoW 直方图(这是使用字典计算的全局描述符)将具有 K 个值。