如何使用 comprehension-python 组合数据框中的不同列
How to combine different columns in a dataframe using comprehension-python
假设一个数据框包含
attacker_1 attacker_2 attacker_3 attacker_4
Lannister nan nan nan
nan Stark greyjoy nan
我想创建另一个名为 AttackerCombo 的列,它将 4 列聚合为 1 列。
我将如何在 python 中定义这样的代码?
我一直在练习 python,我认为这种列表理解是有意义的,但是 [list(x) for x in attackers]
其中 attackers 是 4 列的 numpy 数组显示所有 4 列聚合到 1 列中,但是我也想删除所有 nans。
所以每一行的结果而不是看起来像
starknannanlannister
看起来像 stark/lannister
借助 lambda 函数,您可以在要填充的数据框中设置一个新列:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1)
您没有指定如何聚合它们,例如,如果您想用破折号分隔:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}-{}-{}-{}'.format(x[0],x[1],x[2],x[3]), axis=1)
我觉得你需要apply
with join
and remove NaN
by dropna
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
如果需要separator
空字符串使用DataFrame.fillna
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
.apply(''.join, axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Starkgreyjoy
另外两个 list comprehension
的解决方案 - 首先比较 notnull
然后检查是否 string
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
假设一个数据框包含
attacker_1 attacker_2 attacker_3 attacker_4 Lannister nan nan nan nan Stark greyjoy nan
我想创建另一个名为 AttackerCombo 的列,它将 4 列聚合为 1 列。 我将如何在 python 中定义这样的代码? 我一直在练习 python,我认为这种列表理解是有意义的,但是 [list(x) for x in attackers] 其中 attackers 是 4 列的 numpy 数组显示所有 4 列聚合到 1 列中,但是我也想删除所有 nans。 所以每一行的结果而不是看起来像
starknannanlannister看起来像
stark/lannister
借助 lambda 函数,您可以在要填充的数据框中设置一个新列:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}{}{}{}'.format(x[0],x[1],x[2],x[3]), axis=1)
您没有指定如何聚合它们,例如,如果您想用破折号分隔:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].apply(lambda x : '{}-{}-{}-{}'.format(x[0],x[1],x[2],x[3]), axis=1)
我觉得你需要apply
with join
and remove NaN
by dropna
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
如果需要separator
空字符串使用DataFrame.fillna
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']].fillna('') \
.apply(''.join, axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Starkgreyjoy
另外两个 list comprehension
的解决方案 - 首先比较 notnull
然后检查是否 string
:
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if pd.notnull(e)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy
#python 3 - isinstance(e, str), python 2 - isinstance(e, basestring)
df['attackers'] = df[['attacker_1','attacker_2','attacker_3','attacker_4']] \
.apply(lambda x: '/'.join([e for e in x if isinstance(e, str)]), axis=1)
print (df)
attacker_1 attacker_2 attacker_3 attacker_4 attackers
0 Lannister NaN NaN NaN Lannister
1 NaN Stark greyjoy NaN Stark/greyjoy