python 如果下一列不在上一列中,则连接多列
python concat multiple columns if next column not in previous column
我有这样的示例数据:
col1 col2 col3
PYTHON RD APT 3 NaN
STACK AVE APT 2-3 APT 2-3 NaN
OVER ST 1/2 UNIT 1/2 UNIT 1/2
FLOW RD NaN NaN
我想创建一个新字段:
col1 col2 col3 COMBINED
PYTHON RD APT 3 NaN PYTHON RD APT 3
STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3
OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
FLOW RD NaN NaN FLOW RD
我试过:
columns = ["col1", "col2", "col3"]
COMBINED = ''
for col in columns:
df[col] = df[col].fillna("")
COMBINED = COMBINED + df[col].str.strip() + ' '
df['COMBINED'] = COMBINED.str.strip()
以上一个可以合并,但在第二次观察中重复STACK AVE APT 2-3 APT 2-3
。
有什么建议吗?
print(
df[["col1", "col2"]]
.fillna("")
.apply(
lambda x: x.loc["col1"]
if x.loc["col2"] in x.loc["col1"]
else x.loc["col1"] + " " + x.loc["col2"],
axis=1,
)
)
打印:
col1 col2 COMBINED
0 PYTHON RD APT 3 PYTHON RD APT 3
1 STACK AVE APT 2-3 APT 2-3 STACK AVE APT 2-3
2 OVER ST 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
3 FLOW RD NaN FLOW RD
编辑:对于许多列:
def combine(x):
out = []
for word in x:
if word and not any(word in w for w in out):
out.append(word)
return " ".join(out)
columns = ["col1", "col2", "col3"]
df["COMBINED"] = df[columns].fillna("").apply(combine, axis=1)
print(df)
打印:
col1 col2 col3 COMBINED
0 PYTHON RD APT 3 NaN PYTHON RD APT 3
1 STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3
2 OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
3 FLOW RD NaN NaN FLOW RD
不确定这是否涵盖了您的所有情况:
def combine(row):
row = row.fillna("")
result = row["col1"]
for col in ["col2", "col3"]:
if not row[col] in result:
result += " " + row[col]
return result
df["COMBINED"] = df.apply(combine, axis=1)
让我们尝试使用 unique 和 join
df['col4']=df.fillna('').apply(lambda X:",".join(X.unique()).strip('\,$'),axis=1)
col1 col2 col3 col4
0 PYTHON RD APT 3 NaN PYTHON RD,APT 3
1 STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3,APT 2-3
2 OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2,UNIT 1/2
3 FLOW RD NaN NaN FLOW RD
我有这样的示例数据:
col1 col2 col3
PYTHON RD APT 3 NaN
STACK AVE APT 2-3 APT 2-3 NaN
OVER ST 1/2 UNIT 1/2 UNIT 1/2
FLOW RD NaN NaN
我想创建一个新字段:
col1 col2 col3 COMBINED
PYTHON RD APT 3 NaN PYTHON RD APT 3
STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3
OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
FLOW RD NaN NaN FLOW RD
我试过:
columns = ["col1", "col2", "col3"]
COMBINED = ''
for col in columns:
df[col] = df[col].fillna("")
COMBINED = COMBINED + df[col].str.strip() + ' '
df['COMBINED'] = COMBINED.str.strip()
以上一个可以合并,但在第二次观察中重复STACK AVE APT 2-3 APT 2-3
。
有什么建议吗?
print(
df[["col1", "col2"]]
.fillna("")
.apply(
lambda x: x.loc["col1"]
if x.loc["col2"] in x.loc["col1"]
else x.loc["col1"] + " " + x.loc["col2"],
axis=1,
)
)
打印:
col1 col2 COMBINED
0 PYTHON RD APT 3 PYTHON RD APT 3
1 STACK AVE APT 2-3 APT 2-3 STACK AVE APT 2-3
2 OVER ST 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
3 FLOW RD NaN FLOW RD
编辑:对于许多列:
def combine(x):
out = []
for word in x:
if word and not any(word in w for w in out):
out.append(word)
return " ".join(out)
columns = ["col1", "col2", "col3"]
df["COMBINED"] = df[columns].fillna("").apply(combine, axis=1)
print(df)
打印:
col1 col2 col3 COMBINED
0 PYTHON RD APT 3 NaN PYTHON RD APT 3
1 STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3
2 OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2 UNIT 1/2
3 FLOW RD NaN NaN FLOW RD
不确定这是否涵盖了您的所有情况:
def combine(row):
row = row.fillna("")
result = row["col1"]
for col in ["col2", "col3"]:
if not row[col] in result:
result += " " + row[col]
return result
df["COMBINED"] = df.apply(combine, axis=1)
让我们尝试使用 unique 和 join
df['col4']=df.fillna('').apply(lambda X:",".join(X.unique()).strip('\,$'),axis=1)
col1 col2 col3 col4
0 PYTHON RD APT 3 NaN PYTHON RD,APT 3
1 STACK AVE APT 2-3 APT 2-3 NaN STACK AVE APT 2-3,APT 2-3
2 OVER ST 1/2 UNIT 1/2 UNIT 1/2 OVER ST 1/2,UNIT 1/2
3 FLOW RD NaN NaN FLOW RD