有没有办法在顺序日历中填写销售记录的缺失行

Question

大家：

这是我关于 pandas 包如何填写顺序日历中缺少的行的问题。

背景：

table 是我的带有销售记录的数据集的样本。如您所知，有些产品销量不佳。因此，我们可以发现在201003 -201005 期间缺少“Category-A & product-seed”的一些记录。因此，我很难计算catagory-product中每个组的“sequential growth rate%”。

最初，我想使用“groupby+apply”来挖掘每个组丢失了哪些周期，然后我可以恢复并“pct_change”它们。虽然它不起作用。我不知道根本原因在哪里...

如果您知道该怎么做，能否与我们分享您的意见？欣赏！

数据集：

日历：

结果：

添加一些信息：

我的日历是一个由“month/quarter/semi-annua/annuall”组成的日期时间格式的 srting。例如，2010Q1，或 2019H1。所以我希望有一种方法可以通过我的特定日历来填充缺失的行。

换句话说，我想做的第一步是计算我的特定日历之间缺少哪些行。然后第二步是 python 可以插入带有类别和产品信息的缺失行。谢谢

Answer 1

因此，根据您的数据，这可以通过多种方式有效地实现。我会指出两个。

首先是数据：

df = pd.DataFrame(
    {'Month': [201001, 201002, 201006, 201007, 201008, 201001, 201002, 201007, 201008],
    'Category': ['A'] * 9,
    'Product': ['seed'] * 5 + ['flower'] * 4,
    'Sales': [200, 332, 799, 122, 994, 799, 122, 994, 100]}
    ).set_index(['Month', 'Category', 'Product'])

重塑df

只有当所有可能的日期在 df 中至少出现一次时，这才有效。

df = df.unstack(['Category', 'Product']).fillna(0).stack(['Category', 'Product'])
print(df.reset_index())

输出

    Month Category Product  Sales
0  201001        A  flower  799.0
1  201001        A    seed  200.0
2  201002        A  flower  122.0
3  201002        A    seed  332.0
4  201006        A  flower    0.0
5  201006        A    seed  799.0
6  201007        A  flower  994.0
7  201007        A    seed  122.0
8  201008        A  flower  100.0
9  201008        A    seed  994.0

如您所见，此样本数据不包括第 3-5 个月

重建索引

如果我们使用 date/product pandas 的所有可能组合构建新索引，将使用 df.reindex()

添加缺失的行

months = np.arange(201001, 201008, dtype=np.int)
cats = ['A']
products =['seed', 'flower']
df = df.reindex(
    index=pd.MultiIndex.from_product(
        [months, cats, products],
        names=df.index.names),
    fill_value=0)

print(df.reset_index())

输出

     Month Category Product  Sales
0   201001        A    seed    200
1   201001        A  flower    799
2   201002        A    seed    332
3   201002        A  flower    122
4   201003        A    seed      0
5   201003        A  flower      0
6   201004        A    seed      0
7   201004        A  flower      0
8   201005        A    seed      0
9   201005        A  flower      0
10  201006        A    seed    799
11  201006        A  flower      0
12  201007        A    seed    122
13  201007        A  flower    994

有没有办法在顺序日历中填写销售记录的缺失行

Is there any way to fill in the missing rows for sale records in sequencial calendar

data-analysis

dataframe

pandas

重塑df

重建索引