有人可以解释 BigramAssocMeasures.chi_sq 的语法吗？

Question

我正在使用NLTK的BigramAssocMeasures.chi_sq来找出不同class中单词提供的信息内容。但是我不知道如何为这个函数提供数据。

NLTK 的定义说 """使用卡方对二元组进行评分，即 phi-sq 乘以二元组的数量，如 Manning 和 Schutze 5.3.3 中所述。 """ return n_xx * cls.phi_sq(n_ii, (n_ix, n_xi), n_xx)

n_ii, (n_ix, n_xi), n_xx 代表什么?

Answer 1

我找到了以下来源进行解释：

第一个来源解释了主题及其在情绪分析中的应用以及 python 代码。第二个来源提供了更多的代码示例。第三个来源包含您想要的解释：

The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word w in question, while x indicates the appearance of any word. Thus, for example::
n_ii counts (w1, w2), i.e. the bigram being scored
n_ix counts (w1, *)
n_xi counts (*, w2)
n_xx counts (*, *), i.e. any bigram
This may be shown with respect to a contingency table::
        w1    ~w1
     ------ ------
 w2 | n_ii | n_oi | = n_xi
     ------ ------
~w2 | n_io | n_oo |
     ------ ------
     = n_ix        TOTAL = n_xx

我希望这项研究有所帮助。

有人可以解释 BigramAssocMeasures.chi_sq 的语法吗？

Can someone explain the syntax of BigramAssocMeasures.chi_sq?

python

nltk

chi-squared