为什么 pd.to_numeric `errors=''` 等同于 `errors='coerce'`

Question

我在 python 3.7 和 pandas 0.24.2

设置：

s = pd.Series(['10', '12', '15', '20', 'A', '31', 'C', 'D'])

In [36]: s
Out[36]:
0    10
1    12
2    15
3    20
4     A
5    31
6     C
7     D
dtype: object

to_numeric 与 errors='coerce'

pd.to_numeric(s, errors='coerce')

Out[37]:
0    10.0
1    12.0
2    15.0
3    20.0
4     NaN
5    31.0
6     NaN
7     NaN
dtype: float64

to_numeric 和 errors=''（空字符串）

pd.to_numeric(s, errors='')

Out[38]:
0    10.0
1    12.0
2    15.0
3    20.0
4     NaN
5    31.0
6     NaN
7     NaN
dtype: float64

to_numeric 与 errors='ljljalklag'。即，随机字符串

pd.to_numeric(s, errors='ljljalklag')

Out[39]:
0    10.0
1    12.0
2    15.0
3    20.0
4     NaN
5    31.0
6     NaN
7     NaN
dtype: float64

也就是说，将除字符串raise、ignore以外的任何字符串传递给pd.to_numeric的errors参数等同于errors='coerce'.

这是功能还是错误？

Answer 1

AFAIK，这是预期的行为，因为源代码：

# pandas/core/tools/numeric.py
... 
coerce_numeric = errors not in ("ignore", "raise") # line 147
...

因此它仅检查 errors 是否为 raise 或 ignore，否则默认为 coerce。

Answer 2

这已在 0.25.0 版中修复，以验证 errors 关键字（参见 #26394）。

0.25.0 中的新行为：

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0'

In [2]: pd.to_numeric([1, 'a', 2.2], errors='foo')
---------------------------------------------------------------------------
ValueError: invalid error value specified

0.24.2 中的先前行为：

In [1]: import pandas as pd; pd.__version__
Out[1]: '0.24.2'

In [2]: pd.to_numeric([1, 'a', 2.2], errors='foo')
Out[2]: array([1. , nan, 2.2])

为什么 pd.to_numeric `errors=''` 等同于 `errors='coerce'`

Why pd.to_numeric `errors=''` is equivalent to `errors='coerce'`

python

pandas

python-3.6