scipy t test_ind 中的值错误

ValueError in scipy t test_ind

我有以下 csv 文件:

SRA ID  ERR169499            ERR169498           ERR169497
Label   1                    0                   1
TaxID   PRJEB3251_ERR169499  PRJEB3251_ERR169499 PRJEB3251_ERR169499
333046  0.05                 0.99                99.61
1049    0.03                 2.34                34.33
337090  0.01                 9.78                23.22
99007   22.33                2.90                0.00

我有 92 列用于标签为 0 的案例和 95 列用于标签为 1 的控件。我必须执行两个样本独立 T 检验和秩和检验到目前为止我有:

df  = pd.read_csv('final_out_transposed.csv', header=[1,2], index_col=[0])
case = df.xs('0', axis=1, level=0).dropna()
ctrl = df.xs('1', axis=1, level=0).dropna()
(tt_val, p_ttest) = ttest_ind(case, ctrl, equal_var=False)

为此我收到错误:ValueError: operands could not be broadcast together with shapes (92,) (95,)

回溯是:

File "<ipython-input-152-d58634e75106>", line 1, in <module>
runfile('C:/IBD Bioproject/New folder/temp_3251.py', wdir='C:/IBD 
Bioproject/New folder')

File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/IBD Bioproject/New folder/temp_3251.py", line 106, in <module>
tt_val, p_ttest = ttest_ind(case, ctrl, equal_var=False)

File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\scipy\stats\stats.py", line 4068, in ttest_ind
df, denom = _unequal_var_ttest_denom(v1, n1, v2, n2)


File "C:\Users\ksingh1\AppData\Local\Continuum\Anaconda3\lib\site-
packages\scipy\stats\stats.py", line 3872, in _unequal_var_ttest_denom
df = (vn1 + vn2)**2 / (vn1**2 / (n1 - 1) + vn2**2 / (n2 - 1))

ValueError: operands could not be broadcast together with shapes (92,) (95,)

我读的很少posts but its still unclear also I went through numpy broadcast

提前致谢

显然 Pandas DataFrame 的 xs 方法创建的对象看起来像二维数组。当传递给 ttest_ind.

时,这些必须被展平以看起来像一维数组

试试这个:

ttest_ind(case.values.ravel(), ctrl.values.ravel(), equal_var=False)

Pandas对象的values属性给出了一个numpy数组,ravel()方法将数组扁平化为一维