使用 NaN 舍入 Pandas 数据框列中的值

Question

我有一个 Pandas 数据框，其中包含一列 float64 值：

tempDF = pd.DataFrame({ 'id': [12,12,12,12,45,45,45,51,51,51,51,51,51,76,76,76,91,91,91,91],
                        'measure': [3.2,4.2,6.8,5.6,3.1,4.8,8.8,3.0,1.9,2.1,2.4,3.5,4.2,5.2,4.3,3.6,5.2,7.1,6.5,7.3]})

我想创建一个仅包含整数部分的新列。我的第一个想法是使用 .astype(int):

tempDF['int_measure'] = tempDF['measure'].astype(int)

这工作正常，但更复杂的是，我的列包含缺失值：

tempDF.ix[10,'measure'] = np.nan

此缺失值导致 .astype(int) 方法失败并显示：

ValueError: Cannot convert NA to integer

我想我可以将数据列中的浮点数向下舍入。但是，.round(0) 函数将舍入到最接近的整数（更高或更低）而不是向下舍入。我找不到等效于“.floor()”的函数，它将作用于 Pandas 数据帧的列。

有什么建议吗？

Answer 1

你可以申请 numpy.floor;

import numpy as np

tempDF['int_measure'] = tempDF['measure'].apply(np.floor)

    id  measure  int_measure
0   12      3.2            3
1   12      4.2            4
2   12      6.8            6
...
9   51      2.1            2
10  51      NaN          NaN
11  51      3.5            3
...
19  91      7.3            7

Answer 2

您也可以试试：

df.apply(lambda s: s // 1)

但是，使用 np.floor 速度更快。

Answer 3

这里的答案已经过时了，从 pandas 0.25.2（可能更早）开始，错误

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

会是

df.iloc[:,0] = df.iloc[:,0].astype(int)

对于一个特定的列。

使用 NaN 舍入 Pandas 数据框列中的值

Rounding down values in Pandas dataframe column with NaNs

python

rounding

dataframe

pandas