有没有办法在 python 中使用 .apply 时使用（如果数字在范围内）？

Question

我有一个看起来像这样的数据框

差异	分钟	最大值
2.5	-5	5
7.3	-3	3
0.1	-0.1	0.1

“最小”和“最大”列中的数字因行而异，但始终取值最小 = -最大。我想要创建的是最后一列，告诉我“差异”列中的值是否介于“最小”和“最大”列中的值之间。像这样：

差异	分钟	最大值	信号
2.5	-5	5	信号
7.3	-3	3	无信号
0.1	-0.1	0.1	信号

如果可以使用布尔运算符，“信号”和“无信号”也可以替换为 True 或 False。

我目前使用的代码如下

df['Signal'] = df['Difference'].apply(lambda x: 'Signal' if x in range((df['Min']), (df['Max'])) else 'No Signal')

这给了我错误代码

  File "<ipython-input-52-13b6ff6e946a>", line 5
df['Signal'] = df['Difference'].apply(lambda x: 'Signal' if x in range((df['Min']), (df['Max'])) else 'No Signal')
 ^
SyntaxError: invalid syntax

我还用下面的代码尝试了不同的方法

df['Signal'] = df['Difference'].apply(lambda x: 'Signal' if df['Min'] <= x <= df['Max'] else 'No Signal')

然后这给了我错误信息

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

这里的问题是我不完全理解错误消息，因此不知道如何修复它。

非常感谢任何帮助。

Answer 1

这是一个简单的解决方案，利用了 Min 值始终是 - Max:

这一事实

df['Signal'] = df.Difference.abs() <= df.Min.abs()

这将为满足条件的那些行创建一个具有 True 的布尔列。

Answer 2

您可以为此使用 pd.Series.between 方法：

import pandas as pd

df = pd.DataFrame({
    'Difference': [2.5, 7.3, 0.1],
    'Min': [-5, -3, -0.1],
    'Max': [5, 3, 0.1],
    })

print(df['Difference'].between(df['Min'], df['Max']))

输出是一个布尔系列：

0     True
1    False
2     True
dtype: bool

然后我们只需要map这些值就可以得到你想要的'Signal'列：

df['Signal'] = df['Difference'].between(df['Min'], df['Max']).map({True: 'Signal', False: 'No Signal'})

print(df)

输出：

   Difference  Min  Max     Signal
0         2.5 -5.0  5.0     Signal
1         7.3 -3.0  3.0  No Signal
2         0.1 -0.1  0.1     Signal

Answer 3

这是一个您可以尝试的解决方案，使用 between + np.where

df['Signal'] = (
    np.where(df['Difference'].between(df['Min'], df['Max']), 'Signal', 'No Signal')
)

   Difference  Min  Max     Signal
0         2.5 -5.0  5.0     Signal
1         7.3 -3.0  3.0  No Signal
2         0.1 -0.1  0.1     Signal

Answer 4

使用应用轻松修复您发布的解决方案：

df['Signal'] = df.apply(lambda row: 'Signal' if row['Min'] <= row['Difference'] <= row['Max'] else 'No Signal', axis = 1)

上面row是DataFrame的一整行，允许row['Min']等访问需要的列项
axis = 1 表示我们正在对行使用应用。

Answer 5

如果要访问多个 col/row 值，则需要将函数应用于 DataFrame（确保指定正确的轴）：

df['Signal'] = df.apply(lambda x: 'Signal' if x['Min'] <= x['Difference'] <= x['Max'] else 'No Signal', axis=1)

该错误是指您将一个值与一个系列（多个值）进行比较。 Pandas 不知道您要将 x 与系列中的哪个值进行比较。

或者，您可以使用 numpy 的 where 和 pandas' between:

df['Signal'] = np.where(df['Difference'].between(df['Min'], df['Max']), 'Signal', 'No Signal')

Answer 6

无需使用 apply 行方式，这可能非常慢且不是首选方式，请使用矢量化方法。

df['Signal'] = (df['Difference'].le(df['Max']) & df['Difference'].ge(df['Min'])).map({
    True: 'Signal',
    False: 'No Signal'
})

print(df)

#    Difference  Min  Max     Signal
# 0         2.5 -5.0  5.0     Signal
# 1         7.3 -3.0  3.0  No Signal
# 2         0.1 -0.1  0.1     Signal

有没有办法在 python 中使用 .apply 时使用（如果数字在范围内）？

Is there a way to use (if number in range) whilst using .apply in python?

python

range

dataframe