用 NA 加入两个 pandas 系列文本

Question

我有两个 pandas 个带有文本的系列，我想加入它们以获得一个带有加入文本的系列。

两个系列都基于相同的索引，但一个系列的值较少，这导致加入时出现 NA 值。

这是一个玩具示例：

import pandas as pd

s1 = pd.Series(['red', 'blue', 'green', 'black'], index=[1,2,3,4])
s2 = pd.Series(['large', 'small'], index=[1,3])

s1

    1      red
    2     blue
    3    green
    4    black
    dtype: object

s2

    1    large
    3    small
    dtype: object

现在我想用分隔符连接两个系列的文本以获得以下系列：

1      red,large
2           blue
3    green,small
4          black

这是我目前尝试的方法：

1.

s1.str.cat(s2, sep=',')

1      red,large
2            NaN
3    green,small
4            NaN
dtype: object

NaN 值而不是第一个系列的值

2.

s1.str.cat(s2, sep=',', na_rep='')

1      red,large
2          blue,
3    green,small
4         black,
dtype: object

尾随逗号

3.

s1.str.cat(s2, sep=',', na_rep='').str.strip(',')

这确实有效，但它使代码更难理解，我不想使用任何额外的代码来修复一开始就应该正确完成的事情！

4.

pd.concat([s1,s2], axis=1).apply(','.join)

TypeError: sequence item 1: expected str instance, float found

5.

pd.concat([s1,s2], axis=1).agg('|'.join, axis=1)

TypeError: sequence item 1: expected str instance, float found

由于 NA 值而不起作用。

那么我怎样才能做到这一点呢？

Answer 1

一个解决方法是在 s2 之前添加逗号，然后在 cat 和 s1 和 na_rep='' 上添加逗号，例如：

print (s1.str.cat(',' + s2, na_rep=''))
1      red,large
2           blue
3    green,small
4          black
dtype: object

Answer 2

另一种选择

s1.append(s2).groupby(level=0).agg(','.join)
1      red,large
2           blue
3    green,small
4          black
dtype: object

用 NA 加入两个 pandas 系列文本

Join two pandas series of text with NA

python

join

series

pandas

na

1.

2.

3.

4.

5.