比较 2 pandas 数据框列并根据值是否相同创建新列

Comparing 2 pandas dataframe columns and creating new column based on if the values are same or not

我有两个 df 列 addr_num1addr_num2,如下所示:

addr_num1  addr_num2
   10          10
   20          20
   33          35
   40          40
   50          53

我想创建一个新列,如果两个值相同,我将使用其中一个。如果没有,我会像下面这样组合它们:

addr_num3
   10
   20
  33-35
   40
  50-53

我该怎么做?请指教

使用条件语句的简单方法:

s1 = df['addr_num1'].astype(str)
s2 = df['addr_num2'].astype(str)

import numpy as np
df['addr_num3'] = np.where(s1==s2, s1, s1+'-'+s2)

使用整形的替代方法:

df['addr_num3'] = (df[['addr_num1', 'addr_num2']]
 .astype(str)
 .reset_index()
 .melt(id_vars='index')
 .drop_duplicates(['index', 'value'])
 .groupby('index')['value'].agg('-'.join)
)

输出:

   addr_num1  addr_num2 addr_num3
0         10         10        10
1         20         20        20
2         33         35     33-35
3         40         40        40
4         50         53     50-53

您可以使用两个步骤

将第一列和第二列的所有值设置为 str,由 - 分隔,这将用于非“匹配项”

使用 .loc 过滤匹配项并将值设置为第一列(作为字符串以确保一致性)

df['addr_num3'] = df['addr_num1'].apply(str)+'-'+df['addr_num2'].apply(str)
df.loc[df['addr_num1']==df['addr_num2'],'addr_num3']=df['addr_num1'].apply(str)

loc 允许根据条件设置列值

Pandas docs on loc

Pandas docs on apply