Pandas value_counts 带有手动标签和排序
Pandas value_counts with manual labels and sorting
我有一个包含代码的字段(本例中为 DMDEDUC2)。我想在此字段上计算频率 table (value_counts) 并使用用户指定的标签显示它。下面的代码完全实现了我想要的...但我觉得我肯定错过了实现预期结果的更 标准 方法。
从逻辑上讲,value_counts
和 replace
行无法简化。但当然其余的可以更优雅。
有没有更简单的方法可以得到这个结果?更像熊猫的解决方案?
# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
1: "<9"
, 2: "9-11"
, 3: "HS/GED"
, 4: "Some college/AA"
, 5: "College"
, 7: "Refused"
, 9: "Don't know"
}
# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order...
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()
# Convert the resulting Series to a DataFrame
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)
所需的输出(按[未显示]代码排序,而不是按值或标签排序):
COUNTS
DMDEDUC2x
<9 655
9-11 643
HS/GED 1186
Some college/AA 1621
College 1366
Don't know 3
小样本数据集的期望输出:
COUNTS
DMDEDUC2x
9-11 1
HS/GED 2
Some college/AA 3
College 2
我认为它可以很容易地浓缩成两行:
vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')
我有一个包含代码的字段(本例中为 DMDEDUC2)。我想在此字段上计算频率 table (value_counts) 并使用用户指定的标签显示它。下面的代码完全实现了我想要的...但我觉得我肯定错过了实现预期结果的更 标准 方法。
从逻辑上讲,value_counts
和 replace
行无法简化。但当然其余的可以更优雅。
有没有更简单的方法可以得到这个结果?更像熊猫的解决方案?
# Tiny dataset for clarity
import pandas as pd
df = pd.DataFrame({ 'DMDEDUC2': [5, 3, 3, 5, 4, 2, 4, 4] })
d = {
1: "<9"
, 2: "9-11"
, 3: "HS/GED"
, 4: "Some college/AA"
, 5: "College"
, 7: "Refused"
, 9: "Don't know"
}
# First get value counts (vc) for DMDEDUC2
# This line gets all the data I need in the correct order...
# but without the labels I need.
vc = df.DMDEDUC2.value_counts().sort_index()
# Convert the resulting Series to a DataFrame
# to allow for clear labels in a logical order
vc = vc.to_frame()
vc['DMDEDUC2x'] = vc.index
vc.DMDEDUC2x = vc.DMDEDUC2x.replace(d)
vc = vc.set_index('DMDEDUC2x')
vc = vc.rename({'DMDEDUC2':'COUNTS'}, axis=1)
print(vc)
所需的输出(按[未显示]代码排序,而不是按值或标签排序):
COUNTS
DMDEDUC2x
<9 655
9-11 643
HS/GED 1186
Some college/AA 1621
College 1366
Don't know 3
小样本数据集的期望输出:
COUNTS
DMDEDUC2x
9-11 1
HS/GED 2
Some college/AA 3
College 2
我认为它可以很容易地浓缩成两行:
vc = df.DMDEDUC2.value_counts().sort_index().to_frame(name='COUNTS')
vc.index = vc.index.map(d).rename('DMDEDUC2')