如何在一段时间内按天对 pandas 数据帧进行重新采样？

Question

我有一个这样的数据框：

df.head()
Out[2]: 
         price   sale_date 
0  477,000,000  1396/10/30 
1  608,700,000  1396/10/30 
2  580,000,000  1396/10/03 
3  350,000,000  1396/10/03 
4  328,000,000  1396/03/18

它有超出范围的日期时间
那么我按照下面的方法将它们设为时间段

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)

def conv(x):
    return pd.Period(year=x // 10000,
                     month=x // 100 % 100,
                     day=x % 100, freq='D')
 
df['sale_date'] = df['sale_date'].str.replace('/','').astype(int).apply(conv)

现在我想像下面这样按天重新取样：

df.resample(freq='d', on='sale_date').sum()

但它给了我这个错误：

resample() got an unexpected keyword argument 'freq'

Answer 1

在 pandas 1.1.3 中，resample 和 Grouper 与 Periods 似乎无法正常工作（我猜是错误）：

df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)

def conv(x):
    return pd.Period(year=x // 10000,
                     month=x // 100 % 100,
                     day=x % 100, freq='D')
 
df['sale_date'] = df['sale_date'].apply(conv)

# df = df.set_index('sale_date').resample('D')['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00

# df = df.set_index('sale_date').groupby(pd.Grouper(freq='D'))['price'].sum()
#OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1396-03-18 00:00:00

可能的解决方案由 sum 汇总，因此如果重复 sale_date，则 price 值相加：

df = df.groupby('sale_date')['price'].sum().reset_index()
print (df)
    sale_date      price
0  1396-03-18  328000000
1  1396-10-03  580000000
2  1396-10-30  477000000
3  1396-11-25  608700000
4  1396-12-05  350000000

编辑：可以通过 Series.reindex with period_range:

s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
df = s.reindex(rng, fill_value=0).reset_index()
print (df)
      sale_date      price
0    1396-03-18  328000000
1    1396-03-19          0
2    1396-03-20          0
3    1396-03-21          0
4    1396-03-22          0
..          ...        ...
258  1396-12-01          0
259  1396-12-02          0
260  1396-12-03          0
261  1396-12-04          0
262  1396-12-05  350000000

[263 rows x 2 columns]

如何在一段时间内按天对 pandas 数据帧进行重新采样？

how can I resample pandas dataframe by day on period time?

python

python-datetime

pandas