比较 2 pandas 数据框列并根据值是否相同创建新列
Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not
我有两个 df 列 addr_num1
和 addr_num2
,如下所示:
addr_num1 addr_num2
10 10
20 20
33 35
40 40
50 53
我想创建一个新列,如果两个值相同,我将使用其中一个。如果没有,我会像下面这样组合它们:
addr_num3
10
20
33-35
40
50-53
我该怎么做?请指教
使用条件语句的简单方法:
s1 = df['addr_num1'].astype(str)
s2 = df['addr_num2'].astype(str)
import numpy as np
df['addr_num3'] = np.where(s1==s2, s1, s1+'-'+s2)
使用整形的替代方法:
df['addr_num3'] = (df[['addr_num1', 'addr_num2']]
.astype(str)
.reset_index()
.melt(id_vars='index')
.drop_duplicates(['index', 'value'])
.groupby('index')['value'].agg('-'.join)
)
输出:
addr_num1 addr_num2 addr_num3
0 10 10 10
1 20 20 20
2 33 35 33-35
3 40 40 40
4 50 53 50-53
您可以使用两个步骤
将第一列和第二列的所有值设置为 str
,由 -
分隔,这将用于非“匹配项”
使用 .loc
过滤匹配项并将值设置为第一列(作为字符串以确保一致性)
df['addr_num3'] = df['addr_num1'].apply(str)+'-'+df['addr_num2'].apply(str)
df.loc[df['addr_num1']==df['addr_num2'],'addr_num3']=df['addr_num1'].apply(str)
loc
允许根据条件设置列值
Pandas docs on loc
Pandas docs on apply
我有两个 df 列 addr_num1
和 addr_num2
,如下所示:
addr_num1 addr_num2
10 10
20 20
33 35
40 40
50 53
我想创建一个新列,如果两个值相同,我将使用其中一个。如果没有,我会像下面这样组合它们:
addr_num3
10
20
33-35
40
50-53
我该怎么做?请指教
使用条件语句的简单方法:
s1 = df['addr_num1'].astype(str)
s2 = df['addr_num2'].astype(str)
import numpy as np
df['addr_num3'] = np.where(s1==s2, s1, s1+'-'+s2)
使用整形的替代方法:
df['addr_num3'] = (df[['addr_num1', 'addr_num2']]
.astype(str)
.reset_index()
.melt(id_vars='index')
.drop_duplicates(['index', 'value'])
.groupby('index')['value'].agg('-'.join)
)
输出:
addr_num1 addr_num2 addr_num3
0 10 10 10
1 20 20 20
2 33 35 33-35
3 40 40 40
4 50 53 50-53
您可以使用两个步骤
将第一列和第二列的所有值设置为 str
,由 -
分隔,这将用于非“匹配项”
使用 .loc
过滤匹配项并将值设置为第一列(作为字符串以确保一致性)
df['addr_num3'] = df['addr_num1'].apply(str)+'-'+df['addr_num2'].apply(str)
df.loc[df['addr_num1']==df['addr_num2'],'addr_num3']=df['addr_num1'].apply(str)
loc
允许根据条件设置列值
Pandas docs on loc
Pandas docs on apply