在 Pandas 时间序列数据框中使用自定义条件填充缺失数据
Filling missing data using a custom condition in a Pandas time series dataframe
下面是我的dataframe
的一部分,其中有很多缺失值。
A B
S a b c d e a b c d e
date
2020-10-15 1.0 2.0 NaN NaN NaN 10.0 11.0 NaN NaN NaN
2020-10-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-17 NaN NaN NaN 4.0 NaN NaN NaN NaN 13.0 NaN
2020-10-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-20 4.0 6.0 4.0 1.0 9.0 10.0 2.0 13.0 4.0 13.0
我想用 specific backward fill condition
替换每列中的 NANs
。
例如,在 (A,a) 列中,日期 16、17、18 和 19 出现缺失值。下一个值是“4”对第 20 个。我希望这个值(列中的下一个非缺失值)分布在所有这些日期中,包括 20 日,并逐渐增加 10% 的值。即 (A,a) 列在第 16、17、18、19 和 20 号日期大约获得 .655、.720、.793、.872 和 .96 的值。这应该是所有列中所有缺失值的方法。
我尝试使用 bfill() 函数,但无法理解如何将所需的公式合并为一个选项。
我检查了 link 和 Whosebug
上的其他一些 link。这有点相似,但在我的例子中,给定列中的 NAN 数量本质上是可变的,并且跨越多行。将列 (A,a) 与列 (A,d) 或列 (B,d) 进行比较。鉴于此,我发现很难采用解决方案来解决我的问题。
感谢任何意见。
这是一种完全矢量化的方法。它非常高效且快速:在 1000 x 1000 矩阵上为 130 毫秒。这是使用 numpy
.
展示一些有趣技术的好机会
首先,让我们深入了解一下要求,具体来说就是每个单元格需要的值是什么。
给出的示例是 [nan, nan, nan, nan, 4.0]
--> [.66, .72, .79, .87, .96]
,它被解释为“逐渐增加 10% 的值”(这样总计就是“到传播”:4.0
).
这是一个几何级数,比率r = 1 + 0.1
:[r^1, r^2, r^3, ...]
,然后归一化求和为1。例如:
r = 1.1
a = 4.0
n = 5
q = np.cumprod(np.repeat(r, n))
a * q / q.sum()
# array([0.65518992, 0.72070892, 0.79277981, 0.87205779, 0.95926357])
我们想直接计算(避免调用Python函数和显式循环,这会多慢),所以我们需要表达封闭形式的归一化因子 q.sum()
。这是一个公认的数量,并且是:
为了概括,我们需要 3 个量来计算每个单元格的值:
a
: 要分配的值
i
:运行 (0 .. n-1) 的索引
n
: 运行 长度
- 那么,值为
v = a * r**i * (r - 1) / (r**n - 1)
。
为了说明 OP 示例中的第一列,其中输入为:[1, nan, nan, nan, nan, 4]
,我们希望:
a = [1, 4, 4, 4, 4, 4]
i = [0, 0, 1, 2, 3, 4]
n = [1, 5, 5, 5, 5, 5]
- 那么,值
v
将是(四舍五入到 2 位小数):[1. , 0.66, 0.72, 0.79, 0.87, 0.96]
.
现在是我们将这三个量作为 numpy 数组获取的部分。
a
最简单,就是df.bfill().values
。但是对于 i
和 n
,我们确实需要做一些工作,首先是将值分配给一个 numpy 数组:
z = df.values
nrows, ncols = z.shape
对于 i
,我们从 NaN
秒的累积计数开始,当值不是 NaN
时重置。这是受此 的“无迭代的 NumPy 中的累积计数”的强烈启发。但是我们这样做是为了一个二维数组,我们也想添加第一行0,并丢弃最后一行来满足我们的需求:
def rcount(z):
na = np.isnan(z)
without_reset = na.cumsum(axis=0)
reset_at = ~na
overcount = np.maximum.accumulate(without_reset * reset_at)
result = without_reset - overcount
return result
i = np.vstack((np.zeros(ncols, dtype=bool), rcount(z)))[:-1]
对于n
,我们需要自己做一些舞蹈,使用numpy的第一原则(如果有时间我会分解这些步骤):
runlen = np.diff(np.hstack((-1, np.flatnonzero(~np.isnan(np.vstack((z, np.ones(ncols))).T)))))
n = np.reshape(np.repeat(runlen, runlen), (nrows + 1, ncols), order='F')[:-1]
所以,把它们放在一起:
def spread_bfill(df, r=1.1):
z = df.values
nrows, ncols = z.shape
a = df.bfill().values
i = np.vstack((np.zeros(ncols, dtype=bool), rcount(z)))[:-1]
runlen = np.diff(np.hstack((-1, np.flatnonzero(~np.isnan(np.vstack((z, np.ones(ncols))).T)))))
n = np.reshape(np.repeat(runlen, runlen), (nrows + 1, ncols), order='F')[:-1]
v = a * r**i * (r - 1) / (r**n - 1)
return pd.DataFrame(v, columns=df.columns, index=df.index)
根据您的示例数据,我们得到:
>>> spread_bfill(df).round(2) # round(2) for printing purposes
A B
a b c d e a b c d e
S
2020-10-15 1.00 2.00 0.52 1.21 1.17 10.00 11.00 1.68 3.93 1.68
2020-10-16 0.66 0.98 0.57 1.33 1.28 1.64 0.33 1.85 4.32 1.85
2020-10-17 0.72 1.08 0.63 1.46 1.41 1.80 0.36 2.04 4.75 2.04
2020-10-18 0.79 1.19 0.69 0.30 1.55 1.98 0.40 2.24 1.21 2.24
2020-10-19 0.87 1.31 0.76 0.33 1.71 2.18 0.44 2.47 1.33 2.47
2020-10-20 0.96 1.44 0.83 0.37 1.88 2.40 0.48 2.71 1.46 2.71
为了便于检查,让我们分别查看该示例中的 3 个数量:
>>> a
[[ 1 2 4 4 9 10 11 13 13 13]
[ 4 6 4 4 9 10 2 13 13 13]
[ 4 6 4 4 9 10 2 13 13 13]
[ 4 6 4 1 9 10 2 13 4 13]
[ 4 6 4 1 9 10 2 13 4 13]
[ 4 6 4 1 9 10 2 13 4 13]]
>>> i
[[0 0 0 0 0 0 0 0 0 0]
[0 0 1 1 1 0 0 1 1 1]
[1 1 2 2 2 1 1 2 2 2]
[2 2 3 0 3 2 2 3 0 3]
[3 3 4 1 4 3 3 4 1 4]
[4 4 5 2 5 4 4 5 2 5]]
>>> n
[[1 1 6 3 6 1 1 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]]
这是最后一个示例,用于说明如果列以 1 或多个 NaN
结尾(它们仍然是 NaN
)会发生什么情况:
np.random.seed(10)
a = np.random.randint(0, 10, (6, 6)).astype(float)
a *= np.random.choice([1.0, np.nan], a.shape, p=[.3, .7])
df = pd.DataFrame(a)
>>> df
0 1 2 3 4 5
0 NaN NaN NaN NaN NaN 0.0
1 NaN NaN 9.0 NaN 8.0 NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN 8.0 4.0 NaN NaN NaN
4 NaN NaN NaN 6.0 9.0 NaN
5 NaN NaN 2.0 NaN 7.0 8.0
然后:
>>> spread_bfill(df).round(2) # round(2) for printing
0 1 2 3 4 5
0 NaN 1.72 4.29 0.98 3.81 0.00
1 NaN 1.90 4.71 1.08 4.19 1.31
2 NaN 2.09 1.90 1.19 2.72 1.44
3 NaN 2.29 2.10 1.31 2.99 1.59
4 NaN NaN 0.95 1.44 3.29 1.74
5 NaN NaN 1.05 NaN 7.00 1.92
速度
a = np.random.randint(0, 10, (1000, 1000)).astype(float)
a *= np.random.choice([1.0, np.nan], a.shape, p=[.3, .7])
df = pd.DataFrame(a)
%timeit spread_bfill(df)
# 130 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
初始数据:
>>> df
A B
a b c d e a b c d e
date
2020-10-15 1.0 2.0 NaN NaN NaN 10.0 11.0 NaN NaN NaN
2020-10-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-17 NaN NaN NaN 4.0 NaN NaN NaN NaN 13.0 NaN
2020-10-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-20 4.0 6.0 4.0 1.0 9.0 10.0 2.0 13.0 4.0 13.0
定义你的几何序列:
def geomseq(seq):
q = 1.1
n = len(seq)
S = seq.max()
Uo = S * (1-q) / (1-q**n)
Un = [Uo * q**i for i in range(0, n)]
return Un
TL;DR
>>> df.unstack().groupby(df.unstack().sort_index(ascending=False).notna().cumsum().sort_index()).transform(geomseq).unstack(level=[0, 1])
A B
a b c d e a b c d e
date
2020-10-15 1.000000 2.000000 0.518430 1.208459 1.166466 10.000000 11.000000 1.684896 3.927492 1.684896
2020-10-16 0.655190 0.982785 0.570272 1.329305 1.283113 1.637975 0.327595 1.853386 4.320242 1.853386
2020-10-17 0.720709 1.081063 0.627300 1.462236 1.411424 1.801772 0.360354 2.038724 4.752266 2.038724
2020-10-18 0.792780 1.189170 0.690030 0.302115 1.552567 1.981950 0.396390 2.242597 1.208459 2.242597
2020-10-19 0.872058 1.308087 0.759033 0.332326 1.707823 2.180144 0.436029 2.466856 1.329305 2.466856
2020-10-20 0.959264 1.438895 0.834936 0.365559 1.878606 2.398159 0.479632 2.713542 1.462236 2.713542
详情
将您的 dataframe
转换为 series
:
>>> sr = df.unstack()
>>> sr.head(10)
date
A a 2020-10-15 1.0
2020-10-16 NaN # <= group X (final value: .655)
2020-10-17 NaN # <= group X (final value: .720)
2020-10-18 NaN # <= group X (final value: .793)
2020-10-19 NaN # <= group X (final value: .872)
2020-10-20 4.0 # <= group X (final value: .960)
b 2020-10-15 2.0
2020-10-16 NaN
2020-10-17 NaN
2020-10-18 NaN
dtype: float64
现在您可以建立群组:
>>> groups = sr.sort_index(ascending=False).notna().cumsum().sort_index()
>>> groups.head(10)
date
A a 2020-10-15 16
2020-10-16 15 # <= group X15
2020-10-17 15 # <= group X15
2020-10-18 15 # <= group X15
2020-10-19 15 # <= group X15
2020-10-20 15 # <= group X15
b 2020-10-15 14
2020-10-16 13
2020-10-17 13
2020-10-18 13
dtype: int64
应用几何级数:
>>> sr = sr.groupby(groups).transform(geomseq)
>>> sr.head(10)
date
A a 2020-10-15 1.000000
2020-10-16 0.655190 # <= group X15
2020-10-17 0.720709 # <= group X15
2020-10-18 0.792780 # <= group X15
2020-10-19 0.872058 # <= group X15
2020-10-20 0.959264 # <= group X15
b 2020-10-15 2.000000
2020-10-16 0.982785
2020-10-17 1.081063
2020-10-18 1.189170
dtype: float64
最后,根据你的初始 dataframe
重塑 series
:
>>> df = sr.unstack(level=[0, 1])
>>> df
A B
a b c d e a b c d e
date
2020-10-15 1.000000 2.000000 0.518430 1.208459 1.166466 10.000000 11.000000 1.684896 3.927492 1.684896
2020-10-16 0.655190 0.982785 0.570272 1.329305 1.283113 1.637975 0.327595 1.853386 4.320242 1.853386
2020-10-17 0.720709 1.081063 0.627300 1.462236 1.411424 1.801772 0.360354 2.038724 4.752266 2.038724
2020-10-18 0.792780 1.189170 0.690030 0.302115 1.552567 1.981950 0.396390 2.242597 1.208459 2.242597
2020-10-19 0.872058 1.308087 0.759033 0.332326 1.707823 2.180144 0.436029 2.466856 1.329305 2.466856
2020-10-20 0.959264 1.438895 0.834936 0.365559 1.878606 2.398159 0.479632 2.713542 1.462236 2.713542
下面是我的dataframe
的一部分,其中有很多缺失值。
A B
S a b c d e a b c d e
date
2020-10-15 1.0 2.0 NaN NaN NaN 10.0 11.0 NaN NaN NaN
2020-10-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-17 NaN NaN NaN 4.0 NaN NaN NaN NaN 13.0 NaN
2020-10-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-20 4.0 6.0 4.0 1.0 9.0 10.0 2.0 13.0 4.0 13.0
我想用 specific backward fill condition
替换每列中的 NANs
。
例如,在 (A,a) 列中,日期 16、17、18 和 19 出现缺失值。下一个值是“4”对第 20 个。我希望这个值(列中的下一个非缺失值)分布在所有这些日期中,包括 20 日,并逐渐增加 10% 的值。即 (A,a) 列在第 16、17、18、19 和 20 号日期大约获得 .655、.720、.793、.872 和 .96 的值。这应该是所有列中所有缺失值的方法。
我尝试使用 bfill() 函数,但无法理解如何将所需的公式合并为一个选项。
我检查了 link Whosebug
上的其他一些 link。这有点相似,但在我的例子中,给定列中的 NAN 数量本质上是可变的,并且跨越多行。将列 (A,a) 与列 (A,d) 或列 (B,d) 进行比较。鉴于此,我发现很难采用解决方案来解决我的问题。
感谢任何意见。
这是一种完全矢量化的方法。它非常高效且快速:在 1000 x 1000 矩阵上为 130 毫秒。这是使用 numpy
.
首先,让我们深入了解一下要求,具体来说就是每个单元格需要的值是什么。
给出的示例是 [nan, nan, nan, nan, 4.0]
--> [.66, .72, .79, .87, .96]
,它被解释为“逐渐增加 10% 的值”(这样总计就是“到传播”:4.0
).
这是一个几何级数,比率r = 1 + 0.1
:[r^1, r^2, r^3, ...]
,然后归一化求和为1。例如:
r = 1.1
a = 4.0
n = 5
q = np.cumprod(np.repeat(r, n))
a * q / q.sum()
# array([0.65518992, 0.72070892, 0.79277981, 0.87205779, 0.95926357])
我们想直接计算(避免调用Python函数和显式循环,这会多慢),所以我们需要表达封闭形式的归一化因子 q.sum()
。这是一个公认的数量,并且是:
为了概括,我们需要 3 个量来计算每个单元格的值:
a
: 要分配的值i
:运行 (0 .. n-1) 的索引
n
: 运行 长度- 那么,值为
v = a * r**i * (r - 1) / (r**n - 1)
。
为了说明 OP 示例中的第一列,其中输入为:[1, nan, nan, nan, nan, 4]
,我们希望:
a = [1, 4, 4, 4, 4, 4]
i = [0, 0, 1, 2, 3, 4]
n = [1, 5, 5, 5, 5, 5]
- 那么,值
v
将是(四舍五入到 2 位小数):[1. , 0.66, 0.72, 0.79, 0.87, 0.96]
.
现在是我们将这三个量作为 numpy 数组获取的部分。
a
最简单,就是df.bfill().values
。但是对于 i
和 n
,我们确实需要做一些工作,首先是将值分配给一个 numpy 数组:
z = df.values
nrows, ncols = z.shape
对于 i
,我们从 NaN
秒的累积计数开始,当值不是 NaN
时重置。这是受此
def rcount(z):
na = np.isnan(z)
without_reset = na.cumsum(axis=0)
reset_at = ~na
overcount = np.maximum.accumulate(without_reset * reset_at)
result = without_reset - overcount
return result
i = np.vstack((np.zeros(ncols, dtype=bool), rcount(z)))[:-1]
对于n
,我们需要自己做一些舞蹈,使用numpy的第一原则(如果有时间我会分解这些步骤):
runlen = np.diff(np.hstack((-1, np.flatnonzero(~np.isnan(np.vstack((z, np.ones(ncols))).T)))))
n = np.reshape(np.repeat(runlen, runlen), (nrows + 1, ncols), order='F')[:-1]
所以,把它们放在一起:
def spread_bfill(df, r=1.1):
z = df.values
nrows, ncols = z.shape
a = df.bfill().values
i = np.vstack((np.zeros(ncols, dtype=bool), rcount(z)))[:-1]
runlen = np.diff(np.hstack((-1, np.flatnonzero(~np.isnan(np.vstack((z, np.ones(ncols))).T)))))
n = np.reshape(np.repeat(runlen, runlen), (nrows + 1, ncols), order='F')[:-1]
v = a * r**i * (r - 1) / (r**n - 1)
return pd.DataFrame(v, columns=df.columns, index=df.index)
根据您的示例数据,我们得到:
>>> spread_bfill(df).round(2) # round(2) for printing purposes
A B
a b c d e a b c d e
S
2020-10-15 1.00 2.00 0.52 1.21 1.17 10.00 11.00 1.68 3.93 1.68
2020-10-16 0.66 0.98 0.57 1.33 1.28 1.64 0.33 1.85 4.32 1.85
2020-10-17 0.72 1.08 0.63 1.46 1.41 1.80 0.36 2.04 4.75 2.04
2020-10-18 0.79 1.19 0.69 0.30 1.55 1.98 0.40 2.24 1.21 2.24
2020-10-19 0.87 1.31 0.76 0.33 1.71 2.18 0.44 2.47 1.33 2.47
2020-10-20 0.96 1.44 0.83 0.37 1.88 2.40 0.48 2.71 1.46 2.71
为了便于检查,让我们分别查看该示例中的 3 个数量:
>>> a
[[ 1 2 4 4 9 10 11 13 13 13]
[ 4 6 4 4 9 10 2 13 13 13]
[ 4 6 4 4 9 10 2 13 13 13]
[ 4 6 4 1 9 10 2 13 4 13]
[ 4 6 4 1 9 10 2 13 4 13]
[ 4 6 4 1 9 10 2 13 4 13]]
>>> i
[[0 0 0 0 0 0 0 0 0 0]
[0 0 1 1 1 0 0 1 1 1]
[1 1 2 2 2 1 1 2 2 2]
[2 2 3 0 3 2 2 3 0 3]
[3 3 4 1 4 3 3 4 1 4]
[4 4 5 2 5 4 4 5 2 5]]
>>> n
[[1 1 6 3 6 1 1 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]
[5 5 6 3 6 5 5 6 3 6]]
这是最后一个示例,用于说明如果列以 1 或多个 NaN
结尾(它们仍然是 NaN
)会发生什么情况:
np.random.seed(10)
a = np.random.randint(0, 10, (6, 6)).astype(float)
a *= np.random.choice([1.0, np.nan], a.shape, p=[.3, .7])
df = pd.DataFrame(a)
>>> df
0 1 2 3 4 5
0 NaN NaN NaN NaN NaN 0.0
1 NaN NaN 9.0 NaN 8.0 NaN
2 NaN NaN NaN NaN NaN NaN
3 NaN 8.0 4.0 NaN NaN NaN
4 NaN NaN NaN 6.0 9.0 NaN
5 NaN NaN 2.0 NaN 7.0 8.0
然后:
>>> spread_bfill(df).round(2) # round(2) for printing
0 1 2 3 4 5
0 NaN 1.72 4.29 0.98 3.81 0.00
1 NaN 1.90 4.71 1.08 4.19 1.31
2 NaN 2.09 1.90 1.19 2.72 1.44
3 NaN 2.29 2.10 1.31 2.99 1.59
4 NaN NaN 0.95 1.44 3.29 1.74
5 NaN NaN 1.05 NaN 7.00 1.92
速度
a = np.random.randint(0, 10, (1000, 1000)).astype(float)
a *= np.random.choice([1.0, np.nan], a.shape, p=[.3, .7])
df = pd.DataFrame(a)
%timeit spread_bfill(df)
# 130 ms ± 142 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
初始数据:
>>> df
A B
a b c d e a b c d e
date
2020-10-15 1.0 2.0 NaN NaN NaN 10.0 11.0 NaN NaN NaN
2020-10-16 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-17 NaN NaN NaN 4.0 NaN NaN NaN NaN 13.0 NaN
2020-10-18 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-19 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2020-10-20 4.0 6.0 4.0 1.0 9.0 10.0 2.0 13.0 4.0 13.0
定义你的几何序列:
def geomseq(seq):
q = 1.1
n = len(seq)
S = seq.max()
Uo = S * (1-q) / (1-q**n)
Un = [Uo * q**i for i in range(0, n)]
return Un
TL;DR
>>> df.unstack().groupby(df.unstack().sort_index(ascending=False).notna().cumsum().sort_index()).transform(geomseq).unstack(level=[0, 1])
A B
a b c d e a b c d e
date
2020-10-15 1.000000 2.000000 0.518430 1.208459 1.166466 10.000000 11.000000 1.684896 3.927492 1.684896
2020-10-16 0.655190 0.982785 0.570272 1.329305 1.283113 1.637975 0.327595 1.853386 4.320242 1.853386
2020-10-17 0.720709 1.081063 0.627300 1.462236 1.411424 1.801772 0.360354 2.038724 4.752266 2.038724
2020-10-18 0.792780 1.189170 0.690030 0.302115 1.552567 1.981950 0.396390 2.242597 1.208459 2.242597
2020-10-19 0.872058 1.308087 0.759033 0.332326 1.707823 2.180144 0.436029 2.466856 1.329305 2.466856
2020-10-20 0.959264 1.438895 0.834936 0.365559 1.878606 2.398159 0.479632 2.713542 1.462236 2.713542
详情
将您的 dataframe
转换为 series
:
>>> sr = df.unstack()
>>> sr.head(10)
date
A a 2020-10-15 1.0
2020-10-16 NaN # <= group X (final value: .655)
2020-10-17 NaN # <= group X (final value: .720)
2020-10-18 NaN # <= group X (final value: .793)
2020-10-19 NaN # <= group X (final value: .872)
2020-10-20 4.0 # <= group X (final value: .960)
b 2020-10-15 2.0
2020-10-16 NaN
2020-10-17 NaN
2020-10-18 NaN
dtype: float64
现在您可以建立群组:
>>> groups = sr.sort_index(ascending=False).notna().cumsum().sort_index()
>>> groups.head(10)
date
A a 2020-10-15 16
2020-10-16 15 # <= group X15
2020-10-17 15 # <= group X15
2020-10-18 15 # <= group X15
2020-10-19 15 # <= group X15
2020-10-20 15 # <= group X15
b 2020-10-15 14
2020-10-16 13
2020-10-17 13
2020-10-18 13
dtype: int64
应用几何级数:
>>> sr = sr.groupby(groups).transform(geomseq)
>>> sr.head(10)
date
A a 2020-10-15 1.000000
2020-10-16 0.655190 # <= group X15
2020-10-17 0.720709 # <= group X15
2020-10-18 0.792780 # <= group X15
2020-10-19 0.872058 # <= group X15
2020-10-20 0.959264 # <= group X15
b 2020-10-15 2.000000
2020-10-16 0.982785
2020-10-17 1.081063
2020-10-18 1.189170
dtype: float64
最后,根据你的初始 dataframe
重塑 series
:
>>> df = sr.unstack(level=[0, 1])
>>> df
A B
a b c d e a b c d e
date
2020-10-15 1.000000 2.000000 0.518430 1.208459 1.166466 10.000000 11.000000 1.684896 3.927492 1.684896
2020-10-16 0.655190 0.982785 0.570272 1.329305 1.283113 1.637975 0.327595 1.853386 4.320242 1.853386
2020-10-17 0.720709 1.081063 0.627300 1.462236 1.411424 1.801772 0.360354 2.038724 4.752266 2.038724
2020-10-18 0.792780 1.189170 0.690030 0.302115 1.552567 1.981950 0.396390 2.242597 1.208459 2.242597
2020-10-19 0.872058 1.308087 0.759033 0.332326 1.707823 2.180144 0.436029 2.466856 1.329305 2.466856
2020-10-20 0.959264 1.438895 0.834936 0.365559 1.878606 2.398159 0.479632 2.713542 1.462236 2.713542