Python

Question

我对 Python 很陌生。以下是我的数据示例：

Category    May  June  July
Product1    32   41    43
Product2    74   65    65
Product3    17   15    18
Product4    14   13    14

我有很多组数据，我想为每组计算卡方。代码如下：

Product1 = [32,41,43]
chi2, p = scipy.stats.chisquare(Product1)
print('Product1')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product2 = [74,65,65]
chi2, p = scipy.stats.chisquare(Product2)
print('Product2')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product3 = [17,15,18]
chi2, p = scipy.stats.chisquare(Product3)
print('Product3')
if p > 0.05:
    print('Same')
else:
    print('Different')

Product4 = [14,13,14]
chi2, p = scipy.stats.chisquare(Product4)
print('Prokduct4')
if p > 0.05:
    print('Same')
else:
    print('Different')

我用"df = pd.read_excel"插入了数据table，它自带索引，我不知道如何调用每一行来计算。

如何使用循环并从 table 中提取数据来缩短此重复代码？非常感谢您的帮助。

Answer 1

您可以使用循环重复上述步骤，但您也可以利用scipy处理pandas数据帧的能力！您可以使用 axis=1 对数据帧的所有行应用 chisquare 测试。例如：

from scipy.stats import chisquare

df['p'] = chisquare(df[['May', 'June', 'July']], axis=1)[1]

df['same_diff'] = np.where(df['p'] > 0.05, 'same', 'different')

>>> df
   Category  May  June  July         p same_diff
0  Product1   32    41    43  0.411506      same
1  Product2   74    65    65  0.672294      same
2  Product3   17    15    18  0.869358      same
3  Product4   14    13    14  0.975905      same

现在您的数据框将您的 p 值作为一列，无论它们是 "same" 还是 "different" 作为一列

Answer 2

数据加载到pandas数据框后开始：

那么，你可以这样做：

for row in df.iterrows():
    product = row[1][0]
    chi, p = scipy.stats.chisquare(row[1][1:])
    print(product, ":", "same" if p > 0.05 else "different")

这将打印：

Product1 : same
Product2 : same
Product3 : same
Product4 : same

Python - 如何使用循环缩短重复代码？

Python - How can I make this repetitive code shorter by using loop?

scipy

chi-squared