有效地迭代字符串列表以获得成对的 WMD 距离矩阵
Iterate efficiently over a list of strings to get matrix of pairwise WMD distances
我正在尝试从列表字符串(报纸文章)生成成对距离矩阵。
WMD 距离未在 scipy.spatial.distance.pdist 中实现,因此我将此实现挂钩:https://github.com/src-d/wmd-relax 到 SpaCy。但是,我不知道如何遍历我的列表来生成距离矩阵。
根据文档:
import spacy
import wmd
import numpy as np
nlp = spacy.load('en_core_web_md')
nlp.add_pipe(wmd.WMD.SpacySimilarityHook(nlp), last=True)
# given articles is a list of strings
docs = [nlp(article) for article in articles]
# matrix is just a list of lists in terms of Python objects
m = []
for doc1 in docs:
row = []
for doc2 in docs:
# if distance is similarity function
row.append(doc1.similarity(doc2))
m.append(row)
result = np.matrix(m)
我正在尝试从列表字符串(报纸文章)生成成对距离矩阵。
WMD 距离未在 scipy.spatial.distance.pdist 中实现,因此我将此实现挂钩:https://github.com/src-d/wmd-relax 到 SpaCy。但是,我不知道如何遍历我的列表来生成距离矩阵。
根据文档:
import spacy
import wmd
import numpy as np
nlp = spacy.load('en_core_web_md')
nlp.add_pipe(wmd.WMD.SpacySimilarityHook(nlp), last=True)
# given articles is a list of strings
docs = [nlp(article) for article in articles]
# matrix is just a list of lists in terms of Python objects
m = []
for doc1 in docs:
row = []
for doc2 in docs:
# if distance is similarity function
row.append(doc1.similarity(doc2))
m.append(row)
result = np.matrix(m)