是否可以从 NLP ML 管道中的 nltk 向量化器访问词汇列表？

Question

我的管道看起来像

model = make_pipeline(
    TfidfVectorizer(tokenizer=tokenize, min_df=5),
    MultiOutputClassifier(
        estimator=AdaBoostClassifier(
            base_estimator=DecisionTreeClassifier(max_depth=2),
            n_estimators=10, learning_rate=1)))

我想得到TfidfVectorizer汇编的字典。这可能吗？

Answer 1

一旦您适合您的 model，vocabulary_ 参数就会出现。您可以通过

访问它

model['tfidfvectorizer'].vocabulary_

其中 returns 一个包含所有标记及其计数的字典。

是否可以从 NLP ML 管道中的 nltk 向量化器访问词汇列表？

Is it possible to access the vocabulary list from the nltk vectorizer in an NLP ML pipeline?

nlp

pipeline

nltk

data-science