用 None 替换 DataFrame 中的值
Replacing values in DataFrame with None
使用 None
值创建 Pandas DataFrame 时,它们将转换为 NaN
:
> df = pd.DataFrame({'a': [0, None, 2]})
> df
a
0 0.0
1 NaN
2 2.0
如果我按索引将值设置为 None
,结果相同:
> df = pd.DataFrame({'a': [0, 1, 2]})
> df["a"].iloc[1] = None
> df
a
0 0.0
1 NaN
2 2.0
但是,如果我进行替换,奇怪的事情就会开始发生:
> df = pd.DataFrame({'a': [0, 1, 2, 3]})
> df["a"].replace(1, "foo")
a
0 0
1 'foo'
2 2
3 3
> df["a"].replace(2, None)
a
0 0
1 1
2 1
3 3
这是怎么回事?
根据文档字符串
When ``value=None`` and `to_replace` is a scalar, list or
tuple, `replace` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
The command ``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``
如果你想实际替换为None
,传递一个字典:
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the `to_replace` value, it is like the
value(s) in the dict are equal to the `value` parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:
>>> s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
s = pd.Series([10, 'a', 'a', 'b', 'a'])
s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):
当 value=None 且 to_replace 为标量、列表或元组时,replace 使用方法参数(默认为“pad”)进行替换。所以这就是为什么在这种情况下,第 1 行和第 2 行中的“a”值被替换为 10,第 4 行中的“b”被替换。命令 s.replace('a', None) 实际上等同于 s.replace(to_replace='a', value=None,方法='pad'):
s.replace('a', None)
0 10
1 10
2 10
3 b
4 b
dtype: object
使用 None
值创建 Pandas DataFrame 时,它们将转换为 NaN
:
> df = pd.DataFrame({'a': [0, None, 2]})
> df
a
0 0.0
1 NaN
2 2.0
如果我按索引将值设置为 None
,结果相同:
> df = pd.DataFrame({'a': [0, 1, 2]})
> df["a"].iloc[1] = None
> df
a
0 0.0
1 NaN
2 2.0
但是,如果我进行替换,奇怪的事情就会开始发生:
> df = pd.DataFrame({'a': [0, 1, 2, 3]})
> df["a"].replace(1, "foo")
a
0 0
1 'foo'
2 2
3 3
> df["a"].replace(2, None)
a
0 0
1 1
2 1
3 3
这是怎么回事?
根据文档字符串
When ``value=None`` and `to_replace` is a scalar, list or
tuple, `replace` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
The command ``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``
如果你想实际替换为None
,传递一个字典:
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the `to_replace` value, it is like the
value(s) in the dict are equal to the `value` parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:
>>> s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
s = pd.Series([10, 'a', 'a', 'b', 'a'])
s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object
s.replace({'a': None}) is equivalent to s.replace(to_replace={'a': None}, value=None, method=None):
当 value=None 且 to_replace 为标量、列表或元组时,replace 使用方法参数(默认为“pad”)进行替换。所以这就是为什么在这种情况下,第 1 行和第 2 行中的“a”值被替换为 10,第 4 行中的“b”被替换。命令 s.replace('a', None) 实际上等同于 s.replace(to_replace='a', value=None,方法='pad'):
s.replace('a', None)
0 10
1 10
2 10
3 b
4 b
dtype: object