加入两个具有不同 DateTimeIndex 的 Pandas 系列
Join two Pandas Series with different DateTimeIndex
我有两个带有 DateTimeIndex 的 pandas 系列。我想加入这两个系列,以便生成的 DataFrame 使用第一个系列的索引和 "matches" 相应地使用第二个系列的值(在第二个系列中使用线性插值)。
第一个系列:
2020-03-01 1
2020-03-03 2
2020-03-05 3
2020-03-07 4
第二系列:
2020-03-01 20
2020-03-02 22
2020-03-05 25
2020-03-06 35
2020-03-07 36
2020-03-08 45
期望的输出:
2020-03-01 1 20
2020-03-03 2 23
2020-03-05 3 25
2020-03-07 4 36
生成输入数据的代码:
import pandas as pd
import datetime as dt
s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])
s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])
将 concat
与内部联接结合使用:
df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
s1 s2
2020-03-01 1 20
2020-03-05 3 25
2020-03-07 4 36
对 s2
系列进行插值然后删除具有缺失值的行的解决方案:
df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
.assign(s2 = lambda x: x.s2.interpolate('index'))
.dropna())
print (df)
s1 s2
2020-03-01 1.0 20.0
2020-03-03 2.0 23.0
2020-03-05 3.0 25.0
2020-03-07 4.0 36.0
构建组合数据框
# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
s1 s2
2020-03-01 1.0 20.0
2020-03-02 NaN 22.0
2020-03-03 2.0 NaN
2020-03-05 3.0 25.0
2020-03-06 NaN 35.0
2020-03-07 4.0 36.0
2020-03-08 NaN 45.0
插值
df = df.interpolate()
s1 s2
2020-03-01 1.0 20.0
2020-03-02 1.5 22.0
2020-03-03 2.0 23.5
2020-03-05 3.0 25.0
2020-03-06 3.5 35.0
2020-03-07 4.0 36.0
2020-03-08 4.0 45.0
限制行数
# Only keep the rows that were in s1's index.
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
s1 s2
2020-03-01 1.0 20.0
2020-03-03 2.0 23.5
2020-03-05 3.0 25.0
2020-03-07 4.0 36.0
将数字转换回 int64
df = df.astype('int64')
s1 s2
2020-03-01 1 20
2020-03-03 2 23
2020-03-05 3 25
2020-03-07 4 36
一线:
df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')
文档链接:
我有两个带有 DateTimeIndex 的 pandas 系列。我想加入这两个系列,以便生成的 DataFrame 使用第一个系列的索引和 "matches" 相应地使用第二个系列的值(在第二个系列中使用线性插值)。
第一个系列:
2020-03-01 1
2020-03-03 2
2020-03-05 3
2020-03-07 4
第二系列:
2020-03-01 20
2020-03-02 22
2020-03-05 25
2020-03-06 35
2020-03-07 36
2020-03-08 45
期望的输出:
2020-03-01 1 20
2020-03-03 2 23
2020-03-05 3 25
2020-03-07 4 36
生成输入数据的代码:
import pandas as pd
import datetime as dt
s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])
s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])
将 concat
与内部联接结合使用:
df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
s1 s2
2020-03-01 1 20
2020-03-05 3 25
2020-03-07 4 36
对 s2
系列进行插值然后删除具有缺失值的行的解决方案:
df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
.assign(s2 = lambda x: x.s2.interpolate('index'))
.dropna())
print (df)
s1 s2
2020-03-01 1.0 20.0
2020-03-03 2.0 23.0
2020-03-05 3.0 25.0
2020-03-07 4.0 36.0
构建组合数据框
# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
s1 s2
2020-03-01 1.0 20.0
2020-03-02 NaN 22.0
2020-03-03 2.0 NaN
2020-03-05 3.0 25.0
2020-03-06 NaN 35.0
2020-03-07 4.0 36.0
2020-03-08 NaN 45.0
插值
df = df.interpolate()
s1 s2
2020-03-01 1.0 20.0
2020-03-02 1.5 22.0
2020-03-03 2.0 23.5
2020-03-05 3.0 25.0
2020-03-06 3.5 35.0
2020-03-07 4.0 36.0
2020-03-08 4.0 45.0
限制行数
# Only keep the rows that were in s1's index.
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
s1 s2
2020-03-01 1.0 20.0
2020-03-03 2.0 23.5
2020-03-05 3.0 25.0
2020-03-07 4.0 36.0
将数字转换回 int64
df = df.astype('int64')
s1 s2
2020-03-01 1 20
2020-03-03 2 23
2020-03-05 3 25
2020-03-07 4 36
一线:
df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')
文档链接: