连接两个具有重叠日期索引的数据帧,结果数据帧默认为 "left",除了 "left" 是 NaN
Concatenate two dataframes, with some overlaping date index, with resulting dataframe taking "left" as default, except the "left" is NaN
下面给出DF1和DF2,如何得到DFresult?
DF1: DF2: DFresult:
Date | Value Date | Value Date | Value
------------ ------------ ------------
1-01-2019 | 1 1-01-2019 | 1 (no overlap, take the one that exists)
1-02-2019 | 1 1-02-2019 | 1 (no overlap, take the one that exists)
1-03-2019 | np.NaN 1-03-2019 | 2 1-03-2019 | 2 (left is NaN, take right)
1-04-2019 | 1 1-04-2019 | np.NaN 1-04-2019 | 1 (left is not NaN, take left)
1-05-2019 | np.NaN 1-05-2019 | np.NaN 1-05-2019 | np.NaN (both NaN, keep it)
1-06-2019 | 1 1-06-2019 | 2 1-06-2019 | 1 (left is not NaN, take left)
1-07-2019 | 2 1-07-2019 | 2 (no overlap, take the one that exists)
1-08-2019 | 2 1-08-2019 | 2 (no overlap, take the one that exists)
1-09-2019 | 2 1-09-2019 | 2 (no overlap, take the one that exists)
1-10-2019 | 2 1-10-2019 | 2 (no overlap, take the one that exists)
1-11-2019 | 2 1-11-2019 | 2 (no overlap, take the one that exists)
如果我想使用函数来确定重叠决策怎么办?例如,如果 left 高于 right,或者 left 为 NaN,则取 left:
DF1: DF2: DFresult:
Date | Value Date | Value Date | Value
------------ ------------ ------------
1-01-2019 | 1 1-01-2019 | 1 (no overlap, take the one that exists)
1-02-2019 | 1 1-02-2019 | 1 (no overlap, take the one that exists)
1-03-2019 | np.NaN 1-03-2019 | 2 1-03-2019 | 2 (left is NaN, take right)
1-04-2019 | 1 1-04-2019 | np.NaN 1-04-2019 | 1 (right is NaN, take left)
1-05-2019 | np.NaN 1-05-2019 | np.NaN 1-05-2019 | np.NaN (both NaN, keep it)
1-06-2019 | 1 1-06-2019 | 2 1-06-2019 | 2 (left is not higher, take right)
1-06-2019 | 3 1-07-2019 | 2 1-07-2019 | 3 (left is higher, take left)
1-06-2019 | 1 1-08-2019 | 2 1-08-2019 | 2 (left is not higher, take right)
1-09-2019 | 2 1-09-2019 | 2 (no overlap, take the one that exists)
1-10-2019 | 2 1-10-2019 | 2 (no overlap, take the one that exists)
1-11-2019 | 2 1-11-2019 | 2 (no overlap, take the one that exists)
试试
out = pd.concat([DF1,DF2]).groupby('Date',as_index=False).max()
# for your original one
#out = pd.concat([DF1,DF2]).groupby('Date',as_index=False).first()
下面给出DF1和DF2,如何得到DFresult?
DF1: DF2: DFresult:
Date | Value Date | Value Date | Value
------------ ------------ ------------
1-01-2019 | 1 1-01-2019 | 1 (no overlap, take the one that exists)
1-02-2019 | 1 1-02-2019 | 1 (no overlap, take the one that exists)
1-03-2019 | np.NaN 1-03-2019 | 2 1-03-2019 | 2 (left is NaN, take right)
1-04-2019 | 1 1-04-2019 | np.NaN 1-04-2019 | 1 (left is not NaN, take left)
1-05-2019 | np.NaN 1-05-2019 | np.NaN 1-05-2019 | np.NaN (both NaN, keep it)
1-06-2019 | 1 1-06-2019 | 2 1-06-2019 | 1 (left is not NaN, take left)
1-07-2019 | 2 1-07-2019 | 2 (no overlap, take the one that exists)
1-08-2019 | 2 1-08-2019 | 2 (no overlap, take the one that exists)
1-09-2019 | 2 1-09-2019 | 2 (no overlap, take the one that exists)
1-10-2019 | 2 1-10-2019 | 2 (no overlap, take the one that exists)
1-11-2019 | 2 1-11-2019 | 2 (no overlap, take the one that exists)
如果我想使用函数来确定重叠决策怎么办?例如,如果 left 高于 right,或者 left 为 NaN,则取 left:
DF1: DF2: DFresult:
Date | Value Date | Value Date | Value
------------ ------------ ------------
1-01-2019 | 1 1-01-2019 | 1 (no overlap, take the one that exists)
1-02-2019 | 1 1-02-2019 | 1 (no overlap, take the one that exists)
1-03-2019 | np.NaN 1-03-2019 | 2 1-03-2019 | 2 (left is NaN, take right)
1-04-2019 | 1 1-04-2019 | np.NaN 1-04-2019 | 1 (right is NaN, take left)
1-05-2019 | np.NaN 1-05-2019 | np.NaN 1-05-2019 | np.NaN (both NaN, keep it)
1-06-2019 | 1 1-06-2019 | 2 1-06-2019 | 2 (left is not higher, take right)
1-06-2019 | 3 1-07-2019 | 2 1-07-2019 | 3 (left is higher, take left)
1-06-2019 | 1 1-08-2019 | 2 1-08-2019 | 2 (left is not higher, take right)
1-09-2019 | 2 1-09-2019 | 2 (no overlap, take the one that exists)
1-10-2019 | 2 1-10-2019 | 2 (no overlap, take the one that exists)
1-11-2019 | 2 1-11-2019 | 2 (no overlap, take the one that exists)
试试
out = pd.concat([DF1,DF2]).groupby('Date',as_index=False).max()
# for your original one
#out = pd.concat([DF1,DF2]).groupby('Date',as_index=False).first()