Dataframe 条件替换为整数

Dataframe conditional replacement with intigers

我有一个这样的数据框列:

df['col_name'].unique()
>>>array([-1, 'Not Passed, On the boundary', 1, 'Passed, On the boundary',
       'Passed, Unclear result', 'Passes, Unclear result, On the boudnary',
       'Rejected, Unclear result'], dtype=object)

在这一栏中, 如果元素包含单词 'Passed' 作为字段或子字符串,则将整个字段替换为整数 1,否则将其替换为整数 -1。

请帮我解决这个问题

df['col_name'] = df['col_name'].apply(lambda x: 1 if 'Positive' in x else -1)

这会检查 df['col_name'] 中的每个条目,检查字符串是否包含 'Positive' 并将其适当地替换为 1 或 -1。这显然假设此列中的所有条目都是 str

可以使用.str.contains判断value是否包含字符串,将整数值导致的NaN填入False。然后用np.whereTrue填1,False填0。如果想保留原来的1和-1,可以试试np.select.

m1 = df['col_name'].str.contains('Passed').fillna(False)
m2 = df['col_name'].isin([1, -1])

df['col_name_replace_1_-1'] = np.where(m1, 1, -1)
df['col_name_keep_1_-1'] = np.select([m2, m1, ~m1], [df['col_name'], 1, -1], default=df['col_name'])
print(df)

                                  col_name  col_name_replace_1_-1 col_name_keep_1_-1
0                                       -1                     -1                 -1
1              Not Passed, On the boundary                      1                  1
2                                        1                     -1                  1
3                  Passed, On the boundary                      1                  1
4                   Passed, Unclear result                      1                  1
5  Passes, Unclear result, On the boudnary                     -1                 -1
6                 Rejected, Unclear result                     -1                 -1