Python：np.nanpercentile，我的数据框需要哪种数据类型？

Question

我有一个对象类型的熊猫数据框。

df.dtypes

Out:
data        object
stimulus    object
trial       object
dtype: object

df.head()

Out:
    data    stimulus    trial
0   2      -2           1
1   2      -2           2
2   2      -2           3
3   2      -2           4
4   2      -2           5

我想获得数据集的特定百分位数。当我使用这段代码时，我在输出中得到 NaN，可能是因为我的数据集本身有 NaN，python 解释为无穷大，因此在计算更高的百分位数时会出现问题。

df.groupby('stimulus').data.apply(lambda x: np.percentile(x, q=66))

Out:
stimulus
-2.00     2.0
-1.75     2.9
-1.00     1.0
-0.75     1.0
-0.50     0.0
 0.50     7.8
 1.00     9.9
 1.25    11.9
 1.75    13.9
 2.50     NaN

我已经发现我需要使用 np.nanpercentile() 来代替，但是当我使用 np.nanpercentile() 来代替时，我得到了这个错误。我在其他地方读到 np.nanpercentile() 检查输入数组的数据格式，如果不合适则抱怨。您知道我需要更改数据的方式和格式吗？

df.groupby('stimulus').data.apply(lambda x: np.nanpercentile(x, q=66))

Out:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Answer 1

这最终为我完成了工作：

df = df.astype(float)

Python：np.nanpercentile，我的数据框需要哪种数据类型？

Python: np.nanpercentile, which datatype does my dataframe need to have?

python

percentile

dataframe

pandas