用 pandas series.map(dict) 替换 NaN

Question

我正在学习 pandas 教程，该教程显示通过将字典传递给 series.map 方法来替换列中的值。这是教程的一个片段：

但是当我尝试这个时：

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        NaN:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)

我得到NameError: name 'NaN' is not defined

那我做错了什么？

编辑： 为了更好地解释我的目标，我的列如下所示：

我正在尝试将 NaN 替换为 False，将非 NaN 替换为 True。

编辑 2： 这是我在将 NaN 更改为 np.NaN 后仍然面临的问题的图片：

然后，如果我重新运行映射单元并再次显示输出，所有 False 和 NaN 值都会翻转。

Answer 1

很简单，Python 没有内置的 NaN 名称。然而，NumPy 确实如此，因此您可以让您的映射不抛出 error with np.nan。正如乔恩指出的那样，还有 math.nan 等于 float('nan') 。

answers = {
        "Star Wars: Episode I  The Phantom Menace":True, 
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V  The Empire Strikes Back":True,
        "Star Wars: Episode VI  Return of the Jedi":True,
        np.nan:False
        }

不要就此打住，因为那行不通。另一个棘手的事情是 nan 在技术上不等于 任何东西 所以在这样的映射中使用它不会有效。

>>> np.nan == np.nan 
False

因此，您的 DataFrame 中的 NaN 值无论如何都不会被 np.nan 拾取为键，并保持为 NaN。有关此的进一步解释，请参阅 NaNs as key in dictionaries。此外，我敢打赌您的 nan 值实际上是字符串 nan。

最小演示

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> for c in df.columns:
        df[c] = df[c].map(answers)


>>> df
      0     1
0  True   NaN
1  True   NaN
2  True  True

# notice we're still stuck with NaN, as our nan strings weren't picked up

更好的解决方案

话虽这么说，这似乎不太适合 dict 或 map - 您可以只在一个集合中定义 Star Wars 字符串，然后在整个列部分上使用 isin感兴趣。

answers = {
        "Star Wars: Episode I  The Phantom Menace",
        "Star Wars: Episode II  Attack of the Clones" 
        "Star Wars: Episode III  Revenge of the Sith",
        "Star Wars: Episode IV  A New Hope",
        "Star Wars: Episode V  The Empire Strikes Back",
        "Star Wars: Episode VI  Return of the Jedi",
        }

starwars.iloc[:, 3:9].isin(answers)

最小演示

>>> answers = {
            "Star Wars: Episode I  The Phantom Menace",
            "Star Wars: Episode II  Attack of the Clones" 
            "Star Wars: Episode III  Revenge of the Sith",
            "Star Wars: Episode IV  A New Hope",
            "Star Wars: Episode V  The Empire Strikes Back",
            "Star Wars: Episode VI  Return of the Jedi",
            }

>>> df
                                          0                                  1
0  Star Wars: Episode I  The Phantom Menace                                nan
1         Star Wars: Episode IV  A New Hope                                nan
2         Star Wars: Episode IV  A New Hope  Star Wars: Episode IV  A New Hope

>>> df.isin(answers)

      0      1
0  True  False
1  True  False
2  True   True

Answer 2

所以我对另一个解决方案的问题是，由于它的工作方式，代码在第一次运行后将不会以相同的方式运行。我在 Jupyter 笔记本上工作，所以我想要可以运行多次的东西。我只是一个 Python 初学者，但下面的代码似乎可以运行多次，并且只在第一次更改值时运行:

cols = star_wars.columns[3:9]

# Booleans for column values
answers = {
        "Star Wars: Episode I  The Phantom Menace":True,
        "Star Wars: Episode II  Attack of the Clones":True, 
        "Star Wars: Episode III  Revenge of the Sith":True,
        "Star Wars: Episode IV  A New Hope":True,
        "Star Wars: Episode V The Empire Strikes Back":True,
        "Star Wars: Episode VI Return of the Jedi":True,
        True:True,
        False:False,
        np.nan:False
        }

for c in cols:
    star_wars[c] = star_wars[c].map(answers)

用 pandas series.map(dict) 替换 NaN

Replacing NaN with pandas series.map(dict)

python

dictionary

nan

dataframe

pandas

更好的解决方案