Extend/Fill Pandas 和 Python 3.x 中具有零和常量值的时间序列数据

Extend/Fill Time Series Data with zeros and constant values in Pandas with Python 3.x

我在扩展我的时间序列数据时遇到了问题。我有以下数据框:

date_first = df1['date'].min()  # is 2016-08-08
date_last = df1['date'].max()  # is 2016-08-20

>>> df1
         date         customer     qty
149481   2016-08-08   A            400
161933   2016-08-10   A            200
167172   2016-08-13   B            900
170296   2016-08-15   A            300
178221   2016-08-20   B            150

现在我正在重新索引框架并获得以下框架:

df1.set_index('date', inplace=True)

>>> df1
             customer     qty
date
2016-08-08   A            400
2016-08-10   A            200
2016-08-13   B            900
2016-08-15   A            300
2016-08-20   B            150

现在我正在尝试按最早日期和最晚日期为每个客户扩展我的时间序列数据,如下所示:

ix = pd.DataFrame({on_column: pd.Series([date_first, date_last]), 'qty': 0})
result = df1.reindex(ix)

这没有给我预期的结果,我希望它看起来像以下框架:

    >>> df1
    date         customer     qty
0   2016-08-08   A            400
1   2016-08-08   B            0
2   2016-08-09   A            0
3   2016-08-09   B            0
4   2016-08-10   A            200
5   2016-08-10   B            0
...
24  2016-08-20   A            0
25  2016-08-20   B            150

两列都使用MultiIndex.from_product for reindex by original MultiIndex created by set_index

date_first = df1['date'].min()  
date_last = df1['date'].max() 

mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'), 
                                  df1['customer'].unique()], names=['date','customer'])
print (mux)
result = df1.set_index(['date', 'customer']).reindex(mux, fill_value=0).reset_index()
print (result)
         date customer  qty
0  2016-08-08        A  400
1  2016-08-08        B    0
2  2016-08-09        A    0
3  2016-08-09        B    0
4  2016-08-10        A  200
5  2016-08-10        B    0
6  2016-08-11        A    0
7  2016-08-11        B    0
8  2016-08-12        A    0
9  2016-08-12        B    0
10 2016-08-13        A    0
11 2016-08-13        B  900
12 2016-08-14        A    0
13 2016-08-14        B    0
14 2016-08-15        A  300
15 2016-08-15        B    0
16 2016-08-16        A    0
17 2016-08-16        B    0
18 2016-08-17        A    0
19 2016-08-17        B    0
20 2016-08-18        A    0
21 2016-08-18        B    0
22 2016-08-19        A    0
23 2016-08-19        B    0
24 2016-08-20        A    0
25 2016-08-20        B  150

这是我的解决方案封装到一个函数中:

@staticmethod
def extend_time_series_data(data, date_column, customer_column, qty_column):
    data = data.reset_index(drop=True)
    date_first = data[date_column].min()
    date_last = data[date_column].max()
    data[date_column] = pd.to_datetime(data[date_column])
    data[qty_column] = pd.to_numeric(data[qty_column])

    mux = pd.MultiIndex.from_product([pd.date_range(date_first, date_last, freq='d'),
                                      data[customer_column].unique()], names=[date_column, customer_column])
    # print(mux)
    result = data.set_index([date_column, customer_column]).reindex(mux, fill_value=0).reset_index()
    # print(result)
    print('Extending time series data was successful!')
    return result

也许它会帮助其他人解决类似的问题。