找到 pandas 数据框列，这些列被认为是浮点数但实际上可以写成整数

Question

我有一个包含多列的数据框。

df = pd.DataFrame({'A' : [1.0, 3.0, 4.0, 5.0, 2.0], 
                  'B' : [2, 4, 5, 8, 9],
                  'C' : [1.8, 4.1, 4.0, 5.6, 2.0],
                  'D' : [99, 100, 101, 101, 99],
                  'D' : [99.0, 1000.0, np.nan, 101.0, 99.0]})
df

     A   B   C    D
0   1.0  2  1.8  99.0
1   3.0  4  4.1  1000.0
2   4.0  5  4.0  NaN
3   5.0  8  5.6  101.0
4   2.0  9  2.0  99.0

应用 dtype 后，我们看到列 A 和 D 被视为浮点数。

df.dtypes

A    float64
B      int64
C    float64
D    float64
dtype: object

我想找到我的 df 中的所有列，它们可以表示为整数但被视为浮点数。

预期结果：

['A', 'D']

该列表包含所有列，这些列被视为浮点数但实际上可以表示为整数。

如何找到这些列？

Answer 1

使用Index.intersection of float and Int64 columns created by DataFrame.convert_dtypes:

c1 = df.select_dtypes(np.float).columns
c2 = df.convert_dtypes().select_dtypes('Int64').columns

out = c1.intersection(c2, sort=False).tolist()
print (out)
['A', 'D']

Answer 2

一种方法是通过 df.convert_dtypes():

pd.concat([df.dtypes,df.convert_dtypes().dtypes]).astype(str).str.title().reset_index().drop_duplicates(keep=False)['index'].unique()

输出：

array(['A', 'D'], dtype=object)

Answer 3

我们可以使用 DataFrame.convert_dtypes 检查哪些列已转换为 Int64 但在原始数据框中为 float64：

dtypes_old = df.dtypes
dtypes_new = df.convert_dtypes().dtypes

s = dtypes_old.eq("float64") & dtypes_new.eq("Int64")
s[s].index

Index(['A', 'D'], dtype='object')

注意 Int64 是一个可空整数数据类型，并且是一个 pandas 扩展数据类型。

找到 pandas 数据框列，这些列被认为是浮点数但实际上可以写成整数

Find pandas dataframe columns which are considered as floats but actually can be written as integers

format

integer

dataframe

pandas