有人可以解释 BigramAssocMeasures.chi_sq 的语法吗?
Can someone explain the syntax of BigramAssocMeasures.chi_sq?
我正在使用NLTK的BigramAssocMeasures.chi_sq来找出不同class中单词提供的信息内容。但是我不知道如何为这个函数提供数据。
NLTK 的定义说
"""使用卡方对二元组进行评分,即 phi-sq 乘以二元组的数量,如 Manning 和 Schutze 5.3.3 中所述。
"""
return n_xx * cls.phi_sq(n_ii, (n_ix, n_xi), n_xx)
n_ii, (n_ix, n_xi), n_xx 代表什么?
我找到了以下来源进行解释:
- text classification for sentiment analysis
- python code search - nullege - samples for chi_sq
- python code search - nullege - explanation of BigramAssocMeasures
第一个来源解释了主题及其在情绪分析中的应用以及 python 代码。第二个来源提供了更多的代码示例。第三个来源包含您想要的解释:
The arguments constitute the marginals of a contingency table,
counting the occurrences of particular events in a corpus. The letter
i in the suffix refers to the appearance of the word w in question,
while x indicates the appearance of any word. Thus, for example::
n_ii counts (w1, w2), i.e. the bigram being scored
n_ix counts (w1, *)
n_xi counts (*, w2)
n_xx counts (*, *), i.e. any bigram
This may be shown with respect to a contingency table::
w1 ~w1
------ ------
w2 | n_ii | n_oi | = n_xi
------ ------
~w2 | n_io | n_oo |
------ ------
= n_ix TOTAL = n_xx
我希望这项研究有所帮助。
我正在使用NLTK的BigramAssocMeasures.chi_sq来找出不同class中单词提供的信息内容。但是我不知道如何为这个函数提供数据。
NLTK 的定义说 """使用卡方对二元组进行评分,即 phi-sq 乘以二元组的数量,如 Manning 和 Schutze 5.3.3 中所述。 """ return n_xx * cls.phi_sq(n_ii, (n_ix, n_xi), n_xx)
n_ii, (n_ix, n_xi), n_xx 代表什么?
我找到了以下来源进行解释:
- text classification for sentiment analysis
- python code search - nullege - samples for chi_sq
- python code search - nullege - explanation of BigramAssocMeasures
第一个来源解释了主题及其在情绪分析中的应用以及 python 代码。第二个来源提供了更多的代码示例。第三个来源包含您想要的解释:
The arguments constitute the marginals of a contingency table, counting the occurrences of particular events in a corpus. The letter i in the suffix refers to the appearance of the word w in question, while x indicates the appearance of any word. Thus, for example::
n_ii counts (w1, w2), i.e. the bigram being scored n_ix counts (w1, *) n_xi counts (*, w2) n_xx counts (*, *), i.e. any bigram
This may be shown with respect to a contingency table::
w1 ~w1 ------ ------ w2 | n_ii | n_oi | = n_xi ------ ------ ~w2 | n_io | n_oo | ------ ------ = n_ix TOTAL = n_xx
我希望这项研究有所帮助。