如何引用pandas数据框的索引字段？

Question

我有以下数据框：

    payment_method_id   payment_plan_days   plan_list_price actual_amount_paid date
msno                                
YyO+tlZtAXYXoZhNr3Vg3+dfVQvrBVGO8j1mfqe4ZHc=    41  30  129 129 2015-01-01
AZtu6Wl0gPojrEQYB8Q3vBSmE2wnZ3hi1FbK1rQQ0A4=    41  30  149 149 2015-01-01
UkDFI97Qb6+s2LWcijVVv4rMAsORbVDT2wNXF0aVbns=    41  30  129 129 2015-01-02

关键是"msno"，我需要查明是否大多数"msno"在不同的日期只使用一个payment_method_id。

所以我尝试按 "msno"、"payment_method_id" 分组，使用

 transactions.groupby(['msno', 'payment_method_id']).count()

但出现错误：KeyError：'msno'

使用其他字段进行分组工作正常，例如：

 transactions.groupby(['payment_plan_days', 'payment_method_id']).count()

然后对于 msno，我什至可以使用 groupby level=0

 transactions.groupby(level=0)

但我无法将包含第一列的两个级别分组。

这就是它在 transactions.columns

中的样子

Index(['payment_method_id', 'payment_plan_days', 'plan_list_price', 'actual_amount_paid', 'date'] dtype='object')

有什么建议吗？

Answer 1

我觉得你需要reset_index for convert index to column, because your pandas version is bellow 0.20.1:

Strings passed to DataFrame.groupby() as the by parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time.

transactions.reset_index().groupby(['msno', 'payment_method_id']).count()

因此升级后您的代码应该可以正常工作：

transactions.groupby(['msno', 'payment_method_id']).count()

通知：

count and size 之间的区别是 count 省略了 NaNs 而 size 没有。

如何引用pandas数据框的索引字段？

How to refer to the index field of pandas data frame?

python

dataframe

pandas

pandas-groupby