Python，seaborn，使用statannot的统计分析看起来不对

Question

我使用statannot对一些基础数据进行了统计检验，但统计检验的结果似乎并不正确。 IE。我的一些比较得出了“P_val=0.000e+00 U_stat=0.000e+00”，我认为这是不可能的。我的数据框 and/or 代码有问题吗？

这是我使用的数据框：

这是我的代码：

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
from statannot import add_stat_annotation
import scipy.stats as sp

data = pd.read_excel('Z:/DMF/GROUPS/gr_Veening/Users/Vik/scRNA-seq/FACSAria/Adherence-invasion assays/adherence_invasion_assay_a549-RFP 4-6-21.xlsx',sheet_name="Sheet2", header = 0)

sns.set_theme(style="darkgrid")
ax1 = sns.boxplot(x="Strain", y="adherence_counts", data=data)
x = "Strain"
y = "adherence_counts"
order = ["D39", "D39 Δcps", "19F", "19F ΔcomCDE"]
ax1 = sns.boxplot(data=data, x=x, y=y, order=order)
plt.title("Adherence Assay")
plt.ylabel('CFU/ml')
plt.xlabel('')
ax1.set(xticklabels=["D39", "D39 Δ$\it{cps}$", "19F", "19F Δ$\it{comCDE}$"])
add_stat_annotation(ax1, data=data, x=x, y=y, order=order,
                    box_pairs=[("D39", "19F"), ("D39", "D39 Δcps"), ("D39 Δcps", "19F"), ("19F", "19F ΔcomCDE")],
                    test='Mann-Whitney', text_format='star', loc='inside', verbose=2)

最后，这是这个统计测试的结果：

D39 v.s. D39 Δcps: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=0.000e+00 U_stat=0.000e+00
D39 Δcps v.s. 19F: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=1.000e+00 U_stat=2.000e+00
19F v.s. 19F ΔcomCDE: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=7.617e-01 U_stat=8.000e+00
D39 v.s. 19F: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction, P_val=0.000e+00 U_stat=0.000e+00
C:\Users\Vik\anaconda3\lib\site-packages\scipy\stats\stats.py:7171: RuntimeWarning: divide by zero encountered in double_scalars
  z = (bigu - meanrank) / sd

任何帮助将不胜感激，谢谢！

Answer 1

您的问题来自两部分：

统计上，在您的某些情况下（例如“D39”与“19F”），所有项目在一组与另一组中都是 larger/smaller，因此 0 U 统计和极值 p 值。很有可能会有这些结果。它来自仅检查所提供值的等级（此测试的作用），它具有优点和局限性（+ Mann-Whitney 的测试也不适合如此小的样本量，尤其是 scipy 假设等方差）。
现在 z = (bigu - meanrank) / sd 行失败意味着 np.sqrt(T * n1 * n2 * (n1+n2+1) / 12.0) = 0，所以在这种情况下 n1 and/or n2 是 0，（它们是 len(x) 和 len(y)）。 source in scipy 所以，
- statannot 中有一个错误，因为如果 order 和 box_pair 都引用了一个不存在于数据框，我将在 statannotations 中更正。那就谢谢了。
- 但是，我无法使用您的数据框副本重现您的警告。如果这是唯一的错误，您应该会在您向我们展示的位置看到您的绘图中缺少一个框。
  如果不是，是否有可能您更新了一些代码但没有在此处复制最后的输出？否则，可能会有更多的发现，请告诉我们。

编辑：正如在讨论中发现的那样，如果 order、box_pairs 和数据集中的标签之间存在不匹配，则 statannot 中可能会出现第二个问题。这已在 statannotations 中修补，statannot.

的一个分支

Python，seaborn，使用statannot的统计分析看起来不对

Python, seaborn, statistic analysis using statannot doesn't look right

python

statistics

pandas

seaborn