使用将 NaN 保留为 NaN 的阈值在 Python 中创建指示变量

Create an indicator variable in Python using a threshold value leaving NaN's as NaN

我有一些来自包含一些 NaN 的电导率探头的浮点数据。我想根据经验阈值将探测数据转换为指示变量,但我希望 NaN 值保持为 NaN。转换为指标似乎很简单,但问题在于处理 nan 的问题。这是一个阈值为 50 的示例:

import numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = (df.x <=50)*1

产量:

      x  indicator
0   0.0          1
1   NaN          0
2   2.0          1
3   3.0          1
4   4.0          1
5  51.0          0
6  61.0          0
7  71.0          0
8  81.0          0
9  91.0          0

但我希望 nan 的指标是这样的 nan:

      x  indicator
0   0.0          1
1   NaN        NaN  
2   2.0          1
3   3.0          1
4   4.0          1
5  51.0          0
6  61.0          0
7  71.0          0
8  81.0          0
9  91.0          0

感谢任何帮助。谢谢

In [1829]: df['indicator'] = df[df.x <=50]*1                                                                                                                                                                

指标将仅为 x <= 50 的行设置:

In [1830]: df                                                                                                                                                                                               
Out[1830]: 
      x  indicator
0   0.0        0.0
1   NaN        NaN
2   2.0        2.0
3   3.0        3.0
4   4.0        4.0
5  51.0        NaN
6  61.0        NaN
7  71.0        NaN
8  81.0        NaN
9  91.0        NaN

你可以试试这个:

import numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = df.x*(df.x <=50)

输出:

      x  indicator
0   0.0        0.0
1   NaN        NaN
2   2.0        2.0
3   3.0        3.0
4   4.0        4.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0

精确输出:

mport numpy as np
import pandas as pd

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
df['indicator'] = np.where(df.x.isnull(), np.nan, df.x < 50)

输出:

      x  indicator
0   0.0        1.0
1   NaN        NaN
2   2.0        1.0
3   3.0        1.0
4   4.0        1.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0

以为我尝试将 lambda 应用于列 :)

x = [0, np.nan, 2, 3, 4, 51, 61, 71, 81, 91]
df = pd.DataFrame({"x":x})
indicator = lambda x: np.nan if (np.isnan(x)) else (x<=50)*1 
df['indicator'] = df['x'].apply(indicator)
print(df)

打印:

      x  indicator
0   0.0        1.0
1   NaN        NaN
2   2.0        1.0
3   3.0        1.0
4   4.0        1.0
5  51.0        0.0
6  61.0        0.0
7  71.0        0.0
8  81.0        0.0
9  91.0        0.0