Pandas value_counts 带有手动标签和排序

Question

我有一个包含代码的字段（本例中为 DMDEDUC2）。我想在此字段上计算频率 table (value_counts) 并使用用户指定的标签显示它。下面的代码完全实现了我想要的...但我觉得我肯定错过了实现预期结果的更标准方法。

从逻辑上讲，value_counts 和 replace 行无法简化。但当然其余的可以更优雅。

有没有更简单的方法可以得到这个结果？更像熊猫的解决方案？

# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
      1: "<9"
    , 2: "9-11"
    , 3: "HS/GED"
    , 4: "Some college/AA"
    , 5: "College"
    , 7: "Refused"
    , 9: "Don't know"
}

# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order... 
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()

# Convert the resulting Series to a DataFrame 
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)

所需的输出（按[未显示]代码排序，而不是按值或标签排序）：

                 COUNTS
DMDEDUC2x              
<9                  655
9-11                643
HS/GED             1186
Some college/AA    1621
College            1366
Don't know            3

小样本数据集的期望输出：

                 COUNTS
DMDEDUC2x              
9-11                  1
HS/GED                2
Some college/AA       3
College               2

Answer 1

我认为它可以很容易地浓缩成两行：

vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')

Pandas value_counts 带有手动标签和排序

Pandas value_counts with manual labels and sorting

python

label

series

pandas