当逐行应用于 pandas DataFrame 中的 numpy.int 列时，整数格式规范 'd' 会产生 ValueError

Question

假设我创建了一个包含 int 和 float 类型的 pandas 数据框：

>>> df=pd.DataFrame([[1, 1.3], [2, 2.4]], columns=['a', 'b'])
>>> df
   a    b
0  1  1.3
1  2  2.4

显然第 'a' 列由 numpy.int64 个值组成：

>>> df.a.dtype
dtype('int64')
>>> df.a[0]
1
>>> type(df.a[0])
<class 'numpy.int64'>

...我可以使用 d 格式说明符来格式化这些列 'a' 值：

>>> "{a:d}".format(a=df.a[0])
'1'

但是，如果我尝试逐行应用相同的格式，我会收到此错误，指出第 'a' 列中的值是浮点数而不是整数：

>>> df.apply(lambda s: "{a:d}{b:f}".format(**s), axis=1)
Traceback (most recent call last):
...
ValueError: ("Unknown format code 'd' for object of type 'float'", 'occurred at index 0')

这里发生了什么？

Answer 1

当列/行中存在int和float值时，apply方法将值视为floating。

df.apply(lambda x: ( type(x['a']),type(x['b']) ),axis=1)
0    (<class 'numpy.float64'>, <class 'numpy.float6...
1    (<class 'numpy.float64'>, <class 'numpy.float6...
dtype: object

为避免这种情况，您可以使用 DataFrame.astype

将数据框的类型更改为对象

df.astype(object).apply(lambda s: "{a:d}{b:f}".format(**s.astype(int)), axis=1)
0    11.000000
1    22.000000
dtype: object

df.astype(object).apply(lambda x: ( type(x['a']),type(x['b']) ),axis=1)
0    (<class 'int'>, <class 'float'>)
1    (<class 'int'>, <class 'float'>)
dtype: object

Answer 2

让我们通过

修复它

df.apply(lambda s: "{a:.0f}{b:f}".format(**s), axis=1)
0    11.300000
1    22.400000
dtype: object

当逐行应用于 pandas DataFrame 中的 numpy.int 列时，整数格式规范 'd' 会产生 ValueError

Integer format specification 'd' produces ValueError when applied row by row to numpy.int column in pandas DataFrame

python

formatting

number-formatting

pandas