逻辑回归和词袋

Logistic regression and bag of words

据我了解，X1 would be the occurrence of a word while beta1 将是该词的权重。我的问题是关于重量是如何计算的？基于什么？

有点难，因为我不知道你到底想做什么。但总的来说，您有数据为您提供 X。和一个结果。结果应该是伯努利分布的。这意味着只有两种结果是可能的。现在您根据 X 计算概率。例如，您想知道文本是否是关于汤姆汉克斯的。如果文本中有单词“Tom”，则 x1 为 1。 X 还可以描述“汤姆”在文本中出现的频率。您尝试 select 一个测试版，以便 sogmoid function from beta1*x1 returns the right probalility that the text is about "Tom Hanks". If the word "Tom" is present in the text. To calculate beta normaly some machine learning algorithm is used, such as a gradient descent. I simplified it a little bit to get the idea. I think this 解释得很好。最后，您从数据中得到一个模型，该模型预测新数据的结果，而您只知道 X。

逻辑回归和词袋

Logistic regression and bag of words

python

statistics

nlp

machine-learning

data-science