Numpy repeat 将 `nan` 转换为 `str`

Question

这种 numpy 行为似乎有点奇怪。

>>> type(np.array([1, np.nan]).repeat(2)[2])
<class 'numpy.float64'>

但是当我将第一个参数设为字符串时

>>> type(np.array(["a", np.nan]).repeat(2)[2])
<class 'numpy.str_'>

我该如何解决？

Answer 1

来自 documentation:

dtype : data-type, optional

The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to ‘upcast’ the array. For downcasting, use the .astype(t) method.

在您的第一个示例中，1 和 numpy.nan 可以转换为 numpy.float64；在第二个 str （即 str(numpy.nan) = 'nan' 最终出现在你的数组中）。

Answer 2

也许这种查看数组的方式会使区别更清楚：

第一种情况，np.nan是一个浮点数，所以所有元素都是浮点数：

In [310]: np.array([1, np.nan]).repeat(2)                                            
Out[310]: array([ 1.,  1., nan, nan])
In [311]: _.dtype                                                                    
Out[311]: dtype('float64')

第二个，有一个字符串，不能变成浮点数，所以整个数组的dtype是字符串——包括np.nan，现在是'nan':

In [312]: np.array(["a", np.nan]).repeat(2)                                          
Out[312]: array(['a', 'a', 'nan', 'nan'], dtype='<U3')
In [313]: _.dtype                                                                    
Out[313]: dtype('<U3')

repeat与此无关。这就是 np.array 从列表中构造数组的方式，选择最常见的 dtype.

In [321]: np.array(["a", np.nan],dtype=float)                                        
--------------------------------------------------------------------------- 
ValueError: could not convert string to float: 'a'

Numpy repeat 将 `nan` 转换为 `str`

Numpy repeat converts `nan` to `str`

python

numpy

pandas

numpy-ndarray