与空 DataFrame 合并
Merging with empty DataFrame
我正在尝试将一个数据帧 (df1
) 与另一个数据帧 (df2
) 合并,其中 df2
可能为空。合并条件是 df1.index=df2.z
(df1
永远不会为空),但我收到以下错误。
有什么方法可以让它工作吗?
In [31]:
import pandas as pd
In [32]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [1, 2, 3]})
df2 = pd.DataFrame({'x':[], 'y':[], 'z':[]})
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-34-4e9943198dae> in <module>()
----> 1 dfmb = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
37 right_index=right_index, sort=sort, suffixes=suffixes,
38 copy=copy)
---> 39 return op.get_result()
40 if __debug__:
41 merge.__doc__ = _merge_doc % '\nleft : DataFrame'
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in get_result(self)
185
186 def get_result(self):
--> 187 join_index, left_indexer, right_indexer = self._get_join_info()
188
189 ldata, rdata = self.left._data, self.right._data
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in _get_join_info(self)
277 join_index = self.left.index.take(left_indexer)
278 elif self.left_index:
--> 279 join_index = self.right.index.take(right_indexer)
280 else:
281 join_index = Index(np.arange(len(left_indexer)))
/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in take(self, indexer, axis)
981
982 indexer = com._ensure_platform_int(indexer)
--> 983 taken = np.array(self).take(indexer)
984
985 # by definition cannot propogate freq
IndexError: cannot do a non-empty take from an empty axes.
try:
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
dfm = df1 if not df1.empty else df2
可能足以满足您的需求
另一种选择,类似于 Joran 的选择:
try:
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
dfm = df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
我不确定哪个更清楚,但以下两个工作:
In [11]: df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
Out[11]:
a b c x y z
0 1 4 1 NaN NaN NaN
1 2 5 2 NaN NaN NaN
2 3 6 3 NaN NaN NaN
In [12]: df1.loc[:, df1.columns.union(df2.columns)]
Out[12]:
a b c x y z
0 1 4 1 NaN NaN NaN
1 2 5 2 NaN NaN NaN
2 3 6 3 NaN NaN NaN
(我更喜欢前者。)
我正在尝试将一个数据帧 (df1
) 与另一个数据帧 (df2
) 合并,其中 df2
可能为空。合并条件是 df1.index=df2.z
(df1
永远不会为空),但我收到以下错误。
有什么方法可以让它工作吗?
In [31]:
import pandas as pd
In [32]:
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [1, 2, 3]})
df2 = pd.DataFrame({'x':[], 'y':[], 'z':[]})
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-34-4e9943198dae> in <module>()
----> 1 dfmb = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
37 right_index=right_index, sort=sort, suffixes=suffixes,
38 copy=copy)
---> 39 return op.get_result()
40 if __debug__:
41 merge.__doc__ = _merge_doc % '\nleft : DataFrame'
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in get_result(self)
185
186 def get_result(self):
--> 187 join_index, left_indexer, right_indexer = self._get_join_info()
188
189 ldata, rdata = self.left._data, self.right._data
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.pyc in _get_join_info(self)
277 join_index = self.left.index.take(left_indexer)
278 elif self.left_index:
--> 279 join_index = self.right.index.take(right_indexer)
280 else:
281 join_index = Index(np.arange(len(left_indexer)))
/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in take(self, indexer, axis)
981
982 indexer = com._ensure_platform_int(indexer)
--> 983 taken = np.array(self).take(indexer)
984
985 # by definition cannot propogate freq
IndexError: cannot do a non-empty take from an empty axes.
try:
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
dfm = df1 if not df1.empty else df2
可能足以满足您的需求
另一种选择,类似于 Joran 的选择:
try:
dfm = pd.merge(df1, df2, how='outer', left_index=True, right_on='z')
except IndexError:
dfm = df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
我不确定哪个更清楚,但以下两个工作:
In [11]: df1.reindex_axis(df1.columns.union(df2.columns), axis=1)
Out[11]:
a b c x y z
0 1 4 1 NaN NaN NaN
1 2 5 2 NaN NaN NaN
2 3 6 3 NaN NaN NaN
In [12]: df1.loc[:, df1.columns.union(df2.columns)]
Out[12]:
a b c x y z
0 1 4 1 NaN NaN NaN
1 2 5 2 NaN NaN NaN
2 3 6 3 NaN NaN NaN
(我更喜欢前者。)