Dataframe 条件替换为整数
Dataframe conditional replacement with intigers
我有一个这样的数据框列:
df['col_name'].unique()
>>>array([-1, 'Not Passed, On the boundary', 1, 'Passed, On the boundary',
'Passed, Unclear result', 'Passes, Unclear result, On the boudnary',
'Rejected, Unclear result'], dtype=object)
在这一栏中,
如果元素包含单词 'Passed' 作为字段或子字符串,则将整个字段替换为整数 1,否则将其替换为整数 -1。
请帮我解决这个问题
df['col_name'] = df['col_name'].apply(lambda x: 1 if 'Positive' in x else -1)
这会检查 df['col_name']
中的每个条目,检查字符串是否包含 'Positive'
并将其适当地替换为 1 或 -1。这显然假设此列中的所有条目都是 str
可以使用.str.contains
判断value是否包含字符串,将整数值导致的NaN填入False
。然后用np.where
把True
填1,False
填0。如果想保留原来的1和-1,可以试试np.select
.
m1 = df['col_name'].str.contains('Passed').fillna(False)
m2 = df['col_name'].isin([1, -1])
df['col_name_replace_1_-1'] = np.where(m1, 1, -1)
df['col_name_keep_1_-1'] = np.select([m2, m1, ~m1], [df['col_name'], 1, -1], default=df['col_name'])
print(df)
col_name col_name_replace_1_-1 col_name_keep_1_-1
0 -1 -1 -1
1 Not Passed, On the boundary 1 1
2 1 -1 1
3 Passed, On the boundary 1 1
4 Passed, Unclear result 1 1
5 Passes, Unclear result, On the boudnary -1 -1
6 Rejected, Unclear result -1 -1
我有一个这样的数据框列:
df['col_name'].unique()
>>>array([-1, 'Not Passed, On the boundary', 1, 'Passed, On the boundary',
'Passed, Unclear result', 'Passes, Unclear result, On the boudnary',
'Rejected, Unclear result'], dtype=object)
在这一栏中, 如果元素包含单词 'Passed' 作为字段或子字符串,则将整个字段替换为整数 1,否则将其替换为整数 -1。
请帮我解决这个问题
df['col_name'] = df['col_name'].apply(lambda x: 1 if 'Positive' in x else -1)
这会检查 df['col_name']
中的每个条目,检查字符串是否包含 'Positive'
并将其适当地替换为 1 或 -1。这显然假设此列中的所有条目都是 str
可以使用.str.contains
判断value是否包含字符串,将整数值导致的NaN填入False
。然后用np.where
把True
填1,False
填0。如果想保留原来的1和-1,可以试试np.select
.
m1 = df['col_name'].str.contains('Passed').fillna(False)
m2 = df['col_name'].isin([1, -1])
df['col_name_replace_1_-1'] = np.where(m1, 1, -1)
df['col_name_keep_1_-1'] = np.select([m2, m1, ~m1], [df['col_name'], 1, -1], default=df['col_name'])
print(df)
col_name col_name_replace_1_-1 col_name_keep_1_-1
0 -1 -1 -1
1 Not Passed, On the boundary 1 1
2 1 -1 1
3 Passed, On the boundary 1 1
4 Passed, Unclear result 1 1
5 Passes, Unclear result, On the boudnary -1 -1
6 Rejected, Unclear result -1 -1