列出 Pandas 中未出现在数据中的组组合

Question

我有一个 pandas DataFrame，其中包含客户和国家/地区每月的产品使用情况，如下所示：

df = pd.DataFrame(
[
    ('12345', 'CH', 'A', 'Prod 1'),
    ('12345', 'CH', 'A', 'Prod 2'),
    ('67890', 'DE', 'A', 'Prod 1'),
    ('98765', 'CH', 'B', 'Prod 3'),
    ('nnnnn', 'NL', 'C', 'Prod 1')
],
    columns=['Client_ID', 'Country', 'Customer', 'Product Used']
)

我想列出按客户和国家/地区分组的产品使用总量。 pandas groupby 功能让我接近我需要的东西。

df.groupby(['Customer', 'Country','Product Used']).count()

#Reuse Client_ID as Count
Customer    Country Product Used    Client_ID
A           CH      Prod 1          3
                    Prod 2          5
            DE      Prod 1          1
B           CH      Prod 3          2
C           NL      Prod 1          1

有没有办法将没有出现在数据中的组合包含为 0？所以我的结果是这样的：

Customer    Country Prod 1  Prod 2  Prod 3
A           CH      3       5       0
            DE      1       0       0
B           CH      0       0       2
C           NL      1       0       0

Answer 1

使用pd.crosstab:

new_df = pd.crosstab([df['Customer'], df['Country']], df['Product Used'])

new_df:

Product Used      Prod 1  Prod 2  Prod 3
Customer Country                        
A        CH            1       1       0
         DE            1       0       0
B        CH            0       0       1
C        NL            1       0       0

或 unstack after groupby count, with fill_value=0 then droplevel 0 来自列：

new_df = (
    df.groupby(['Customer', 'Country', 'Product Used']).count()
        .unstack(fill_value=0)
        .droplevel(0, axis=1)
)

new_df:

Product Used      Prod 1  Prod 2  Prod 3
Customer Country                        
A        CH            1       1       0
         DE            1       0       0
B        CH            0       0       1
C        NL            1       0       0

或者 pivot_table 将 aggfunc 设置为计数并且 fill_value=0:

new_df = (
    df.pivot_table(index=['Customer', 'Country'], columns='Product Used',
                   values='Client_ID', aggfunc='count', fill_value=0)
)

new_df:

Product Used      Prod 1  Prod 2  Prod 3
Customer Country                        
A        CH            1       1       0
         DE            1       0       0
B        CH            0       0       1
C        NL            1       0       0

列出 Pandas 中未出现在数据中的组组合

Listing group combinations in Pandas that don't appear in data

python

dataframe

pandas

data-science