如何根据第二列的值设置一个 pandas 数据框列值

Question

情况

我有一个 pandas 数据框 df，其中有一列 sentiment_rating。

index	sentiment_rating
2022-03-20	.3
2022-03-21	-.4
2022-03-24	-.7
2022-03-28	.6
2022-03-31	.2

目标

我正在尝试创建一个新列 status 如果 sentiment score 为 .5 或更大，则其值为 positive，如果为 -，则为 negative。 5 或更少，或 neutral 如果介于 -.5 和 .5 之间。

我试过的

我已经安装了 pandas DataFrame 模块，并使用这个 apply 方法：

df['status'] = df['sentiment_rating'].apply(lambda x: 'Positive' if x <= .8 else 'Neutral' if x > -.5 or < .5 else 'Negative' if x < -.5)

结果

我得到 invalid syntax 的 Error message，这并不能说明什么。

我对 lambda 功能没有清楚的了解，我什至不确定 apply 是否是实现我目标的正确方法。

我也试过在二维上用这个测试：df['status'] = ['Positive' if x > '.5' else 'other' for x in df['sentiment_rating']]，结果是 Error message TypeError: '>' not supported between instances of 'float' and 'str'

非常感谢对我的方法和我做错了什么的任何意见。感谢

Answer 1

您可以使用 numpy.select:

>>> import numpy as np
>>> df['status'] = np.select(
        condlist = [df.sentiment_rating > 0.5, df.sentiment_rating < -0.5],
        choicelist = ['positive', 'negative'],
        default = 'neutral'
    )
>>> df
        index  sentiment_rating    status
0  2022-03-20               0.3   neutral
1  2022-03-21              -0.4   neutral
2  2022-03-24              -0.7  negative
3  2022-03-28               0.6  positive
4  2022-03-31               0.2   neutral

对于lambda：

>>> df['status'] = df.sentiment_rating.apply(
        lambda x: 'positive' if x > 0.5 
                  else 
                      'negative' if x < -0.5 
                      else 'neutral'
    )

但这是不必要的，而且速度很慢。

Answer 2

您可以将 lambda 函数提取到单独的函数中，以使其更具可读性。您可以使用以下内容，

def return_status(x):
    if x >= .5:
        return 'Positive'
    elif x <= -.5:
        return 'Negative'
    return 'Neutral'


df['status'] = df['rating'].apply(return_status)
print(df.head())

您将得到以下输出

        index  rating    status
0  2022-03-20     0.3   Neutral
1  2022-03-21    -0.4   Neutral
2  2022-03-24    -0.7  Negative
3  2022-03-28     0.6  Positive
4  2022-03-31     0.2   Neutral

Answer 3

这可能比您尝试的要长一些，但它可以完成工作：

def status(series):
  if series >= 0.5:
    return "Positive"
  elif series > -0.5 and series < 0.5:
    return "Neutral"
  return "Negative"

df["Status"] = df["Sentiments"].apply(status)

如何根据第二列的值设置一个 pandas 数据框列值

How to set one pandas dataframe column value based on a 2nd column's value

python

lambda

dataframe

pandas