将卡方应用于包含分类变量的数据集
Apply Chi-Square to dataset which contains categorical variables
我的数据集包含以下列:
Voted? Political Category
Yes Right
No Left
Not Answered Center
Yes Right
Yes Right
No Right
我需要计算卡方以查看哪个类别与投票的人最相关。两列都包含字符串。我怎样才能给每个值一个数字表示以便应用卡方?
您可以使用 pd.factorize
来编码您的分类变量:
df['nVoted?'] = pd.factorize(df['Voted?'])[0]
df['nCategory'] = pd.factorize(df['Political Category'])[0]
print(df)
# Output
Voted? Political Category nVoted? nCategory
0 Yes Right 0 0
1 No Left 1 1
2 Not Answered Center 2 2
3 Yes Right 0 0
4 Yes Right 0 0
5 No Right 1 0
之后你可以使用scipy.stats.chisquare
我的数据集包含以下列:
Voted? Political Category
Yes Right
No Left
Not Answered Center
Yes Right
Yes Right
No Right
我需要计算卡方以查看哪个类别与投票的人最相关。两列都包含字符串。我怎样才能给每个值一个数字表示以便应用卡方?
您可以使用 pd.factorize
来编码您的分类变量:
df['nVoted?'] = pd.factorize(df['Voted?'])[0]
df['nCategory'] = pd.factorize(df['Political Category'])[0]
print(df)
# Output
Voted? Political Category nVoted? nCategory
0 Yes Right 0 0
1 No Left 1 1
2 Not Answered Center 2 2
3 Yes Right 0 0
4 Yes Right 0 0
5 No Right 1 0
之后你可以使用scipy.stats.chisquare