用 pandas series.map(dict) 替换 NaN

Replacing NaN with pandas series.map(dict)

我正在学习 pandas 教程,该教程显示通过将字典传递给 series.map 方法来替换列中的值。这是教程的一个片段:

但是当我尝试这个时:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers) 

我得到NameError: name 'NaN' is not defined

那我做错了什么?

编辑: 为了更好地解释我的目标,我的列如下所示:

我正在尝试将 NaN 替换为 False,将非 NaN 替换为 True。

编辑 2: 这是我在将 NaN 更改为 np.NaN 后仍然面临的问题的图片:

然后,如果我重新运行映射单元并再次显示输出,所有 False 和 NaN 值都会翻转。

很简单,Python 没有内置的 NaN 名称。然而,NumPy 确实如此,因此您可以让您的映射不抛出 error with np.nan。正如乔恩指出的那样,还有 math.nan 等于 float('nan')

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }

不要就此打住,因为那行不通。 另一个棘手的事情是 nan 在技术上不等于 任何东西 所以在这样的映射中使用它不会有效。

>>> np.nan == np.nan 
False

因此,您的 DataFrame 中的 NaN 值无论如何都不会被 np.nan 拾取为键,并保持为 NaN。有关此的进一步解释,请参阅 NaNs as key in dictionaries。此外,我敢打赌您的 nan 值实际上是字符串 nan

最小演示

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up

更好的解决方案

话虽这么说,这似乎不太适合 dict 或 map - 您可以只在一个集合中定义 Star Wars 字符串,然后在整个列部分上使用 isin感兴趣。

answers = {
        "Star Wars: Episode I  The Phantom Menace",
        "Star Wars: Episode II  Attack of the Clones" 
        "Star Wars: Episode III  Revenge of the Sith",
        "Star Wars: Episode IV  A New Hope",
        "Star Wars: Episode V  The Empire Strikes Back",
        "Star Wars: Episode VI  Return of the Jedi",
        }

starwars.iloc[:, 3:9].isin(answers) 

最小演示

>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True

所以我对另一个解决方案的问题是,由于它的工作方式,代码在第一次 运行 后将不会以相同的方式运行。我在 Jupyter 笔记本上工作,所以我想要可以 运行 多次的东西。我只是一个 Python 初学者,但下面的代码似乎可以 运行 多次,并且只在第一次更改值时 运行:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True,
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V The Empire Strikes Back":True,
        "Star Wars: Episode VI Return of the Jedi":True,
        True:True,
        False:False,
        np.nan:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)