如何使用 Gensim 应用句子级别的 LDA 模型？

How to apply a sentence-level LDA model using Gensim?

是否可以像 Bao 和 Datta（2014）中提出的那样使用 Gensim 应用句子级 LDA 模型？论文is here。

显着特征是它使 "one topic per sentence assumption" (p.1376)。这不同于其他句子级方法，后者通常允许每个句子包含多个主题。 "The most straightforward method is to treat each sentence as a document and apply the LDA model on the collection of sentences rather than documents."（第 1376 页）。但是，我认为假设一个句子处理一个主题更为合理。

谢谢！

您可以运行 Brody & Elhadad (2010) 所说的 local-LDA - 只需将您的文本数据逐句输入 LDA - 如果您将文档拆分为句子，这很容易。但是，LDA 仍然会为每个句子提供多个主题（根据定义，您可以获得所有主题的值，尽管 gensim 的 minimum_probabiliy 默认值为 0.01），这当然与提出的方法不同宝和达达.

但是，supplemental material to the article by Bao & Datta (2014) contains a C or C++ (I assume, it doesn't say in the readme) .exe plus usage instructions in the materials. You could just run that from the command line, or write a wrapper for Python（以 gensim 格式输出会锦上添花）- 如果您这样做，请分享您的代码，这可能对其他人有所帮助。

如何使用 Gensim 应用句子级别的 LDA 模型？

How to apply a sentence-level LDA model using Gensim?

python

nlp

lda

gensim