每个类别的分层归一化直方图
Layered normalised histogram per category
基于,我能够构建一个标准化的分层直方图。但是,似乎标准化是针对样本总数而不是每个类别的样本总数进行的。我想知道如何使用 altair 对每个类别进行标准化?
示例:
import pandas as pd
import altair as alt
source = pd.DataFrame({'age': ['12', '32', '43', '54', '32', '32', '12','20','44','24'],'gender': ['m','m','f','f','f','m','f','m','f','m']})
alt.Chart(source).transform_joinaggregate(
total='count(*)'
).transform_calculate(
pct='1 / datum.total'
).mark_bar().encode(
alt.X('age:Q', bin=True),
alt.Y('sum(pct):Q', axis=alt.Axis(format='%')),
color='gender'
)
如果我理解正确,我认为将 stack='normalize'
传递给 y 编码应该可以。
import pandas as pd
import altair as alt
source = pd.DataFrame({
'age': ['12', '32', '43', '54', '32', '32', '12','20','44','24'],
'gender': ['m','m','f','f','f','m','f','m','f','m']
})
alt.Chart(source).mark_bar().encode(
alt.X('age:O', bin=True),
alt.Y('count()',
stack='normalize',
axis=alt.Axis(title='Group Percentage', format='%'),
),
color='gender'
)
bins = [10+5*i for i in range(10)]
df_plot = pd.crosstab(source.gender, pd.cut(source.age, bins=bins)).apply(lambda r: r/r.sum(), axis=0).stack().reset_index().rename(columns={0:'perc'})
df_plot['age'] = df_plot['age'].astype(str)
alt.Chart(df_plot).mark_bar().encode(
x='age:N',
y=alt.Y('perc:Q', axis=alt.Axis(format='%'), stack=False),
color='gender:N',
opacity=alt.value(0.6)
)
但也许问题是如何让两组的百分比彼此相邻?
如果您想在特定类别内进行标准化,您可以通过向聚合转换添加 groupby
来计算该类别内的总数:
import pandas as pd
import altair as alt
source = pd.DataFrame({
'age': ['12', '32', '43', '54', '32', '32', '12','20','44','24'],
'gender': ['m','m','f','f','f','m','f','m','f','m']
})
alt.Chart(source).transform_joinaggregate(
total='count(*)',
groupby=['gender']
).transform_calculate(
pct='1 / datum.total'
).mark_bar().encode(
alt.X('age:Q', bin=True),
alt.Y('sum(pct):Q', axis=alt.Axis(format='%')),
color='gender'
)
基于
示例:
import pandas as pd
import altair as alt
source = pd.DataFrame({'age': ['12', '32', '43', '54', '32', '32', '12','20','44','24'],'gender': ['m','m','f','f','f','m','f','m','f','m']})
alt.Chart(source).transform_joinaggregate(
total='count(*)'
).transform_calculate(
pct='1 / datum.total'
).mark_bar().encode(
alt.X('age:Q', bin=True),
alt.Y('sum(pct):Q', axis=alt.Axis(format='%')),
color='gender'
)
如果我理解正确,我认为将 stack='normalize'
传递给 y 编码应该可以。
import pandas as pd
import altair as alt
source = pd.DataFrame({
'age': ['12', '32', '43', '54', '32', '32', '12','20','44','24'],
'gender': ['m','m','f','f','f','m','f','m','f','m']
})
alt.Chart(source).mark_bar().encode(
alt.X('age:O', bin=True),
alt.Y('count()',
stack='normalize',
axis=alt.Axis(title='Group Percentage', format='%'),
),
color='gender'
)
bins = [10+5*i for i in range(10)]
df_plot = pd.crosstab(source.gender, pd.cut(source.age, bins=bins)).apply(lambda r: r/r.sum(), axis=0).stack().reset_index().rename(columns={0:'perc'})
df_plot['age'] = df_plot['age'].astype(str)
alt.Chart(df_plot).mark_bar().encode(
x='age:N',
y=alt.Y('perc:Q', axis=alt.Axis(format='%'), stack=False),
color='gender:N',
opacity=alt.value(0.6)
)
但也许问题是如何让两组的百分比彼此相邻?
如果您想在特定类别内进行标准化,您可以通过向聚合转换添加 groupby
来计算该类别内的总数:
import pandas as pd
import altair as alt
source = pd.DataFrame({
'age': ['12', '32', '43', '54', '32', '32', '12','20','44','24'],
'gender': ['m','m','f','f','f','m','f','m','f','m']
})
alt.Chart(source).transform_joinaggregate(
total='count(*)',
groupby=['gender']
).transform_calculate(
pct='1 / datum.total'
).mark_bar().encode(
alt.X('age:Q', bin=True),
alt.Y('sum(pct):Q', axis=alt.Axis(format='%')),
color='gender'
)