使用将 NaN 保留为 NaN 的阈值在 Python 中创建指示变量
Create an indicator variable in Python using a threshold value leaving NaN's as NaN
我有一些来自包含一些 NaN 的电导率探头的浮点数据。我想根据经验阈值将探测数据转换为指示变量,但我希望 NaN 值保持为 NaN。转换为指标似乎很简单,但问题在于处理 nan 的问题。这是一个阈值为 50 的示例:
import numpy as np
import pandas as pd
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = (df.x <=50)*1
产量:
x indicator
0 0.0 1
1 NaN 0
2 2.0 1
3 3.0 1
4 4.0 1
5 51.0 0
6 61.0 0
7 71.0 0
8 81.0 0
9 91.0 0
但我希望 nan 的指标是这样的 nan:
x indicator
0 0.0 1
1 NaN NaN
2 2.0 1
3 3.0 1
4 4.0 1
5 51.0 0
6 61.0 0
7 71.0 0
8 81.0 0
9 91.0 0
感谢任何帮助。谢谢
In [1829]: df['indicator'] = df[df.x <=50]*1
指标将仅为 x <= 50 的行设置:
In [1830]: df
Out[1830]:
x indicator
0 0.0 0.0
1 NaN NaN
2 2.0 2.0
3 3.0 3.0
4 4.0 4.0
5 51.0 NaN
6 61.0 NaN
7 71.0 NaN
8 81.0 NaN
9 91.0 NaN
你可以试试这个:
import numpy as np
import pandas as pd
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = df.x*(df.x <=50)
输出:
x indicator
0 0.0 0.0
1 NaN NaN
2 2.0 2.0
3 3.0 3.0
4 4.0 4.0
5 51.0 0.0
6 61.0 0.0
7 71.0 0.0
8 81.0 0.0
9 91.0 0.0
精确输出:
mport numpy as np
import pandas as pd
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = np.where(df.x.isnull(), np.nan, df.x < 50)
输出:
x indicator
0 0.0 1.0
1 NaN NaN
2 2.0 1.0
3 3.0 1.0
4 4.0 1.0
5 51.0 0.0
6 61.0 0.0
7 71.0 0.0
8 81.0 0.0
9 91.0 0.0
以为我尝试将 lambda 应用于列 :)
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
indicator = lambda x: np.nan if (np.isnan(x)) else (x<=50)*1
df['indicator'] = df['x'].apply(indicator)
print(df)
打印:
x indicator
0 0.0 1.0
1 NaN NaN
2 2.0 1.0
3 3.0 1.0
4 4.0 1.0
5 51.0 0.0
6 61.0 0.0
7 71.0 0.0
8 81.0 0.0
9 91.0 0.0
我有一些来自包含一些 NaN 的电导率探头的浮点数据。我想根据经验阈值将探测数据转换为指示变量,但我希望 NaN 值保持为 NaN。转换为指标似乎很简单,但问题在于处理 nan 的问题。这是一个阈值为 50 的示例:
import numpy as np
import pandas as pd
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = (df.x <=50)*1
产量:
x indicator
0 0.0 1
1 NaN 0
2 2.0 1
3 3.0 1
4 4.0 1
5 51.0 0
6 61.0 0
7 71.0 0
8 81.0 0
9 91.0 0
但我希望 nan 的指标是这样的 nan:
x indicator
0 0.0 1
1 NaN NaN
2 2.0 1
3 3.0 1
4 4.0 1
5 51.0 0
6 61.0 0
7 71.0 0
8 81.0 0
9 91.0 0
感谢任何帮助。谢谢
In [1829]: df['indicator'] = df[df.x <=50]*1
指标将仅为 x <= 50 的行设置:
In [1830]: df
Out[1830]:
x indicator
0 0.0 0.0
1 NaN NaN
2 2.0 2.0
3 3.0 3.0
4 4.0 4.0
5 51.0 NaN
6 61.0 NaN
7 71.0 NaN
8 81.0 NaN
9 91.0 NaN
你可以试试这个:
import numpy as np
import pandas as pd
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = df.x*(df.x <=50)
输出:
x indicator
0 0.0 0.0
1 NaN NaN
2 2.0 2.0
3 3.0 3.0
4 4.0 4.0
5 51.0 0.0
6 61.0 0.0
7 71.0 0.0
8 81.0 0.0
9 91.0 0.0
精确输出:
mport numpy as np
import pandas as pd
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = np.where(df.x.isnull(), np.nan, df.x < 50)
输出:
x indicator
0 0.0 1.0
1 NaN NaN
2 2.0 1.0
3 3.0 1.0
4 4.0 1.0
5 51.0 0.0
6 61.0 0.0
7 71.0 0.0
8 81.0 0.0
9 91.0 0.0
以为我尝试将 lambda 应用于列 :)
x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
indicator = lambda x: np.nan if (np.isnan(x)) else (x<=50)*1
df['indicator'] = df['x'].apply(indicator)
print(df)
打印:
x indicator
0 0.0 1.0
1 NaN NaN
2 2.0 1.0
3 3.0 1.0
4 4.0 1.0
5 51.0 0.0
6 61.0 0.0
7 71.0 0.0
8 81.0 0.0
9 91.0 0.0