如何用 pandas 中的空列表 [] 填充数据帧 Nan 值？

Question

这是我的数据框：

          date                          ids
0     2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1     2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
2     2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3     2011-04-26  Nan
4     2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5     2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...

我想用[]替换Nan。怎么做？ Fillna([]) 没有工作。我什至尝试了 replace(np.nan, []) 但它给出了错误：

 TypeError('Invalid "to_replace" type: \'float\'',)

Answer 1

您可以先使用 loc 找到在 ids 列中具有 nan 的所有行，然后使用 at 循环遍历这些行以设置它们值到空列表：

for row in df.loc[df.ids.isnull(), 'ids'].index:
    df.at[row, 'ids'] = []

>>> df
        date                                             ids
0 2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26                                              []
4 2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

Answer 2

没有作业：

1) 假设我们的数据框中只有浮点数和整数

import math
df.apply(lambda x:x.apply(lambda x:[] if math.isnan(x) else x))

2) 对于任何数据帧

import math
def isnan(x):
    if isinstance(x, (int, long, float, complex)) and math.isnan(x):
        return True

df.apply(lambda x:x.apply(lambda x:[] if isnan(x) else x))

Answer 3

经过一番摸索，我发现这个方法应该是最有效的（没有循环，没有应用），只是分配给一个切片：

isnull = df.ids.isnull()

df.loc[isnull, 'ids'] = [ [[]] * isnull.sum() ]

诀窍是构建大小合适 (isnull.sum()) 的 [] 列表，然后然后将其包含在列表中：值您正在分配的是一个包含空列表作为元素的 2D 数组（1 列，isnull.sum() 行）。

Answer 4

我的方法与@hellpanderrr 的类似，但不是使用 isnan:

来测试列表特性

df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])

我最初尝试使用 pd.isnull（或 pd.notnull）但是，当给定一个列表时，returns 每个元素的空性。

Answer 5

创建一个函数来检查您的条件，如果没有，它 returns 一个空的 list/empty 集合等

然后将该函数应用于变量，但如果需要，也可以将新计算的变量分配给旧变量或新变量。

aa=pd.DataFrame({'d':[1,1,2,3,3,np.NaN],'r':[3,5,5,5,5,'e']})


def check_condition(x):
    if x>0:
        return x
    else:
        return list()

aa['d]=aa.d.apply(lambda x:check_condition(x))

Answer 6

可能更密集：

df['ids'] = [[] if type(x) != list else x for x in df['ids']]

Answer 7

另一个使用 numpy 的解决方案：

df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)

或使用combine_first:

df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))

Answer 8

这可能更快，一个线性解决方案：

df['ids'].fillna('DELETE').apply(lambda x : [] if x=='DELETE' else x)

Answer 9

也许不是最 short/optimized 的解决方案，但我认为可读性很好：

# Packages
import ast

# Masking-in nans
mask = df['ids'].isna()

# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)

缺点是需要加载ast包

编辑

我最近发现 eval() 内置的存在。这避免了导入任何额外的包。

# Masking-in nans
mask = df['ids'].isna()

# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)

Answer 10

令人惊讶的是，传递带有空列表的字典作为值似乎适用于 Series.fillna，但不适用于 DataFrame.fillna - 所以如果你想在单个列上工作，你可以使用这个：

>>> df
     A    B    C
0  0.0  2.0  NaN
1  NaN  NaN  5.0
2  NaN  7.0  NaN
>>> df['C'].fillna({i: [] for i in df.index})
0    []
1     5
2    []
Name: C, dtype: object

通过将解决方案应用于每一列，可以将该解决方案扩展到 DataFrame。

>>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
    A   B   C
0   0   2  []
1  []  []   5
2  []   7  []

注意：对于缺少值很少的大型 Series/DataFrames，这可能会产生不合理数量的一次性空列表。

使用 pandas 1.0.5.

测试

Answer 11

一个简单的解决方案是：

df['ids'].fillna("").apply(list)

如@timgeb 所述，这需要 df['ids'] 包含列表或仅包含 nan。

Answer 12

另一个明确的解决方案：

# select the nulls
sel = df.ids.isnull()

# use apply to only replace the nulls with the list  
df.loc[sel, 'ids'] = df.loc[sel, 'ids'].apply(lambda x: [])

在Python 3.8和Assigment Expressions (PEP 572)之后，这可以表示为一个行而不用计算两次选择：

df.loc[sel, 'ids'] = df.loc[(sel:=df.ids.isnull()), 'ids'].apply(lambda x: [])

Answer 13

你可以试试这个：

df.fillna(df.notna().applymap(lambda x: x or []))

如何用 pandas 中的空列表 [] 填充数据帧 Nan 值？

How to fill dataframe Nan values with empty list [] in pandas?

python

nan

pandas