加入两个具有不同 DateTimeIndex 的 Pandas 系列

Join two Pandas Series with different DateTimeIndex

我有两个带有 DateTimeIndex 的 pandas 系列。我想加入这两个系列,以便生成的 DataFrame 使用第一个系列的索引和 "matches" 相应地使用第二个系列的值(在第二个系列中使用线性插值)。

第一个系列:

2020-03-01    1
2020-03-03    2
2020-03-05    3
2020-03-07    4

第二系列:

2020-03-01    20
2020-03-02    22
2020-03-05    25
2020-03-06    35
2020-03-07    36
2020-03-08    45

期望的输出:

2020-03-01    1    20
2020-03-03    2    23
2020-03-05    3    25
2020-03-07    4    36

生成输入数据的代码:

import pandas as pd
import datetime as dt

s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])

s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])

concat 与内部联接结合使用:

df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
            s1  s2
2020-03-01   1  20
2020-03-05   3  25
2020-03-07   4  36

s2 系列进行插值然后删除具有缺失值的行的解决方案:

df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
        .assign(s2 = lambda x: x.s2.interpolate('index'))
        .dropna())
print (df)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.0
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

构建组合数据框

# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  NaN  22.0
2020-03-03  2.0   NaN
2020-03-05  3.0  25.0
2020-03-06  NaN  35.0
2020-03-07  4.0  36.0
2020-03-08  NaN  45.0

插值

df = df.interpolate()
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  1.5  22.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-06  3.5  35.0
2020-03-07  4.0  36.0
2020-03-08  4.0  45.0

限制行数

# Only keep the rows that were in s1's index. 
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

将数字转换回 int64

df = df.astype('int64')
        s1  s2
2020-03-01   1  20
2020-03-03   2  23
2020-03-05   3  25
2020-03-07   4  36

一线:

df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')

文档链接: