用 pandas series.map(dict) 替换 NaN
Replacing NaN with pandas series.map(dict)
我正在学习 pandas 教程,该教程显示通过将字典传递给 series.map 方法来替换列中的值。这是教程的一个片段:
但是当我尝试这个时:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
NaN:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
我得到NameError: name 'NaN' is not defined
那我做错了什么?
编辑: 为了更好地解释我的目标,我的列如下所示:
我正在尝试将 NaN 替换为 False,将非 NaN 替换为 True。
编辑 2: 这是我在将 NaN
更改为 np.NaN
后仍然面临的问题的图片:
然后,如果我重新运行映射单元并再次显示输出,所有 False 和 NaN 值都会翻转。
很简单,Python 没有内置的 NaN
名称。然而,NumPy 确实如此,因此您可以让您的映射不抛出 error with np.nan
。正如乔恩指出的那样,还有 math.nan
等于 float('nan')
。
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
np.nan:False
}
不要就此打住,因为那行不通。
另一个棘手的事情是 nan
在技术上不等于 任何东西 所以在这样的映射中使用它不会有效。
>>> np.nan == np.nan
False
因此,您的 DataFrame 中的 NaN 值无论如何都不会被 np.nan
拾取为键,并保持为 NaN。有关此的进一步解释,请参阅 NaNs as key in dictionaries。此外,我敢打赌您的 nan
值实际上是字符串 nan
。
最小演示
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> for c in df.columns:
df[c] = df[c].map(answers)
>>> df
0 1
0 True NaN
1 True NaN
2 True True
# notice we're still stuck with NaN, as our nan strings weren't picked up
更好的解决方案
话虽这么说,这似乎不太适合 dict 或 map - 您可以只在一个集合中定义 Star Wars 字符串,然后在整个列部分上使用 isin
感兴趣。
answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
starwars.iloc[:, 3:9].isin(answers)
最小演示
>>> answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> df.isin(answers)
0 1
0 True False
1 True False
2 True True
所以我对另一个解决方案的问题是,由于它的工作方式,代码在第一次 运行 后将不会以相同的方式运行。我在 Jupyter 笔记本上工作,所以我想要可以 运行 多次的东西。我只是一个 Python 初学者,但下面的代码似乎可以 运行 多次,并且只在第一次更改值时 运行:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
True:True,
False:False,
np.nan:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
我正在学习 pandas 教程,该教程显示通过将字典传递给 series.map 方法来替换列中的值。这是教程的一个片段:
但是当我尝试这个时:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
NaN:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)
我得到NameError: name 'NaN' is not defined
那我做错了什么?
编辑: 为了更好地解释我的目标,我的列如下所示:
我正在尝试将 NaN 替换为 False,将非 NaN 替换为 True。
编辑 2: 这是我在将 NaN
更改为 np.NaN
后仍然面临的问题的图片:
然后,如果我重新运行映射单元并再次显示输出,所有 False 和 NaN 值都会翻转。
很简单,Python 没有内置的 NaN
名称。然而,NumPy 确实如此,因此您可以让您的映射不抛出 error with np.nan
。正如乔恩指出的那样,还有 math.nan
等于 float('nan')
。
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
np.nan:False
}
不要就此打住,因为那行不通。
另一个棘手的事情是 nan
在技术上不等于 任何东西 所以在这样的映射中使用它不会有效。
>>> np.nan == np.nan
False
因此,您的 DataFrame 中的 NaN 值无论如何都不会被 np.nan
拾取为键,并保持为 NaN。有关此的进一步解释,请参阅 NaNs as key in dictionaries。此外,我敢打赌您的 nan
值实际上是字符串 nan
。
最小演示
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> for c in df.columns:
df[c] = df[c].map(answers)
>>> df
0 1
0 True NaN
1 True NaN
2 True True
# notice we're still stuck with NaN, as our nan strings weren't picked up
更好的解决方案
话虽这么说,这似乎不太适合 dict 或 map - 您可以只在一个集合中定义 Star Wars 字符串,然后在整个列部分上使用 isin
感兴趣。
answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
starwars.iloc[:, 3:9].isin(answers)
最小演示
>>> answers = {
"Star Wars: Episode I The Phantom Menace",
"Star Wars: Episode II Attack of the Clones"
"Star Wars: Episode III Revenge of the Sith",
"Star Wars: Episode IV A New Hope",
"Star Wars: Episode V The Empire Strikes Back",
"Star Wars: Episode VI Return of the Jedi",
}
>>> df
0 1
0 Star Wars: Episode I The Phantom Menace nan
1 Star Wars: Episode IV A New Hope nan
2 Star Wars: Episode IV A New Hope Star Wars: Episode IV A New Hope
>>> df.isin(answers)
0 1
0 True False
1 True False
2 True True
所以我对另一个解决方案的问题是,由于它的工作方式,代码在第一次 运行 后将不会以相同的方式运行。我在 Jupyter 笔记本上工作,所以我想要可以 运行 多次的东西。我只是一个 Python 初学者,但下面的代码似乎可以 运行 多次,并且只在第一次更改值时 运行:
cols = star_wars.columns[3:9]
# Booleans for column values
answers = {
"Star Wars: Episode I The Phantom Menace":True,
"Star Wars: Episode II Attack of the Clones":True,
"Star Wars: Episode III Revenge of the Sith":True,
"Star Wars: Episode IV A New Hope":True,
"Star Wars: Episode V The Empire Strikes Back":True,
"Star Wars: Episode VI Return of the Jedi":True,
True:True,
False:False,
np.nan:False
}
for c in cols:
star_wars[c] = star_wars[c].map(answers)