Pandas 时间从一个列中重新采样分类数据，计算来自另一个数字列

Question

我有一个带有分类列和数字的数据框，索引设置为时间数据

df = pd.DataFrame({
        'date': [
            '2013-03-01 ', '2013-03-02 ',
            '2013-03-01 ', '2013-03-02',
            '2013-03-01 ', '2013-03-02 '
        ],
        'Kind': [
            'A', 'B', 'A', 'B', 'B', 'B'
        ],
        'Values': [1, 1.5, 2, 3, 5, 3]
    })

df['date'] =  pd.to_datetime(df['date'])
df = df.set_index('date')

以上代码给出：

        Kind    Values
date        
2013-03-01  A   1.0
2013-03-02  B   1.5
2013-03-01  A   2.0
2013-03-02  B   3.0
2013-03-01  B   5.0
2013-03-02  A   3.0

我的目标是实现以下数据框架：


         A_count   B_count  A_Val max   B_Val max
date                
2013-03-01   2         1        2             5
2013-03-02   0         3        0             3

其中也有时间作为索引。在这里，我注意到如果我们使用

data = pd.DataFrame(data.resample('D')['Pack'].value_counts())

我们得到：

    Kind
date    Kind    
2013-03-01  A   2
            B   1
2013-03-02  B   3

Answer 1

在列表理解的列中使用 DataFrame.pivot_table 和扁平化 MultiIndex：

df = pd.DataFrame({
        'date': [
            '2013-03-01 ', '2013-03-02 ',
            '2013-03-01 ', '2013-03-02',
            '2013-03-01 ', '2013-03-02 '
        ],
        'Kind': [
            'A', 'B', 'A', 'B', 'B', 'B'
        ],
        'Values': [1, 1.5, 2, 3, 5, 3]
    })

df['date'] =  pd.to_datetime(df['date'])

#is possible omit
#df = df.set_index('date')

df = df.pivot_table(index='date', columns='Kind', values='Values', aggfunc=['count','max'])
df.columns = [f'{b}_{a}' for a, b in df.columns]
print (df)
            A_count  B_count  A_max  B_max
date                                      
2013-03-01      2.0      1.0    2.0    5.0
2013-03-02      NaN      3.0    NaN    3.0

另一种 Grouper 按天重新采样的解决方案：

df = df.set_index('date')

df = df.groupby([pd.Grouper(freq='d'), 'Kind'])['Values'].agg(['count','max']).unstack()
df.columns = [f'{b}_{a}' for a, b in df.columns]

Pandas 时间从一个列中重新采样分类数据，计算来自另一个数字列

Pandas time re-sampling categorical data from a column with calculations from another numerical column

resampling

pandas

datetimeindex