如何向量化行连接

Question

我得到了一个数据框 df 和一个函数 foo(int x) returns 一个系列。我想使用矢量化操作将每一行的 df 与 foo() 的结果相连接。

例如，给定以下数据框，我对 col_1

列的值调用 foo()

col_1	col_2	col_3
1	1	'a'
12	2	'b'
13	3	'd'
4	4	'c'

如果我们假设

foo(1) = Series('col4': 0, 'col5': 2)
foo(12) = Series('col4': 1, 'col5': 3)
foo(13) = Series('col4': 1, 'col5': 4)
foo(4) = Series('col4': 0, 'col5': 5)

那么输出应该是

col_1	col_2	col_3	col4	col5
1	1	'a'	0	2
12	2	'b'	1	3
13	3	'd'	1	4
4	4	'c'	0	5

Answer 1

编辑：看起来 .from_records 会干净利落地处理地图。您可以尝试使用 pd.concat 代替：

In [118]: pd.DataFrame.from_records(df['col_1'].map(foo))
Out[118]:
   col4  col5
0     0     2
1     1     3
2     1     4
3     0     5

通常，我会使用 .map() 来处理这样的事情，因为它通常比 .apply() 快，但输出结果有点古怪，所以除非你有一个巨大的数据框，我只是将直接的 .apply() 选项与 pd.concat:

一起使用

In [18]: def foo(n):
    ...:     return {1: pd.Series({'col4': 0, 'col5': 2}), 12: pd.Series({'col4': 1, 'col5': 3}), 13: pd.Series({'col4': 1, 'col5': 4}), 4: pd.Series
    ...: ({'col4': 0, 'col5': 5})}[n]
    ...:

In [19]: df
Out[19]:
   col_1  col_2 col_3
0      1      1   'a'
1     12      2   'b'
2     13      3   'd'
3      4      4   'c'

In [20]: pd.concat([df, df['col_1'].apply(foo)], axis=1)
Out[20]:
   col_1  col_2 col_3  col4  col5
0      1      1   'a'     0     2
1     12      2   'b'     1     3
2     13      3   'd'     1     4
3      4      4   'c'     0     5

您可以尝试的另一个选择是让函数 return 成为字典而不是系列

Answer 2

由于您的函数只能接受标量，因此您只能使用 Series.apply。事实上，it's almost as though your function was made exactly for this use case ...

因为如果 func returns 一个 Series 对象，最终输出将是一个很容易加入到原始数据帧的数据帧。从这里你使用 pd.concat 沿适当的轴

pd.concat([df, df.iloc[:,0].apply(foo)], axis=1)

如何向量化行连接

How to vectorize rowwise join

python

vectorization

dataframe

pandas