使用 word2vec 和 Kmeans 进行聚类
Clustering with word2vec and Kmeans
我正在尝试使用 word2vec 和 Kmeans 进行聚类,但它不起作用。
这是我的部分数据:
demain fera chaud à paris pas marseille
mauvais exemple ce n est pas un cliché mais il faut comprendre pourquoi aussi
il y a plus de travail à Paris c est d ailleurs pour cette raison qu autant de gens",
mais s il y a plus de travail, il y a aussi plus de concurrence
s agglutinent autour de la capitale
脚本:
import nltk
import pandas
import pprint
import numpy as np
import pandas as pd
from sklearn import cluster
from sklearn import metrics
from gensim.models import Word2Vec
from nltk.cluster import KMeansClusterer
from sklearn.metrics import adjusted_rand_score
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import NMF
dataset = pandas.read_csv('text.csv', encoding = 'utf-8')
comments = dataset['comments']
verbatim_list = no_duplicate.values.tolist()
min_count = 2
size = 50
window = 4
model = Word2Vec(verbatim_list, min_count=min_count, size=size, window=window)
X = model[model.vocab]
clusters_number = 28
kclusterer = KMeansClusterer(clusters_number, distance=nltk.cluster.util.cosine_distance, repeats=25)
assigned_clusters = kclusterer.cluster(X, assign_clusters=True)
words = list(model.vocab)
for i, word in enumerate(words):
print (word + ":" + str(assigned_clusters[i]))
kmeans = cluster.KMeans(n_clusters = clusters_number)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
clusters = {}
for commentaires, label in zip(verbatim_list, labels):
try:
clusters[str(label)].append(verbatim)
except:
clusters[str(label)] = [verbatim]
pprint.pprint(clusters)
输出:
Traceback (most recent call last):
File "kmwv.py", line 37, in
X = model[model.vocab]
AttributeError: 'Word2Vec' object has no attribute 'vocab'
我需要一个与 word2vec 一起工作的聚类,但每次尝试时,我都会遇到此错误。有什么方法可以用 word2vec 进行聚类吗?
正如 Davide 所说,试试这个:
X = model[model.wv.vocab]
我正在尝试使用 word2vec 和 Kmeans 进行聚类,但它不起作用。
这是我的部分数据:
demain fera chaud à paris pas marseille
mauvais exemple ce n est pas un cliché mais il faut comprendre pourquoi aussi
il y a plus de travail à Paris c est d ailleurs pour cette raison qu autant de gens",
mais s il y a plus de travail, il y a aussi plus de concurrence
s agglutinent autour de la capitale
脚本:
import nltk
import pandas
import pprint
import numpy as np
import pandas as pd
from sklearn import cluster
from sklearn import metrics
from gensim.models import Word2Vec
from nltk.cluster import KMeansClusterer
from sklearn.metrics import adjusted_rand_score
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import NMF
dataset = pandas.read_csv('text.csv', encoding = 'utf-8')
comments = dataset['comments']
verbatim_list = no_duplicate.values.tolist()
min_count = 2
size = 50
window = 4
model = Word2Vec(verbatim_list, min_count=min_count, size=size, window=window)
X = model[model.vocab]
clusters_number = 28
kclusterer = KMeansClusterer(clusters_number, distance=nltk.cluster.util.cosine_distance, repeats=25)
assigned_clusters = kclusterer.cluster(X, assign_clusters=True)
words = list(model.vocab)
for i, word in enumerate(words):
print (word + ":" + str(assigned_clusters[i]))
kmeans = cluster.KMeans(n_clusters = clusters_number)
kmeans.fit(X)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
clusters = {}
for commentaires, label in zip(verbatim_list, labels):
try:
clusters[str(label)].append(verbatim)
except:
clusters[str(label)] = [verbatim]
pprint.pprint(clusters)
输出:
Traceback (most recent call last):
File "kmwv.py", line 37, in
X = model[model.vocab]
AttributeError: 'Word2Vec' object has no attribute 'vocab'
我需要一个与 word2vec 一起工作的聚类,但每次尝试时,我都会遇到此错误。有什么方法可以用 word2vec 进行聚类吗?
正如 Davide 所说,试试这个:
X = model[model.wv.vocab]