重塑具有重复值的 Pandas 数据框
Reshaping a Pandas data frame with duplicate values
使用 Plotly go.Table()
函数和 Pandas
,我试图创建一个 table 来汇总一些数据。我的数据如下:
import pandas as pd
test_df = pd.DataFrame({'Manufacturer':['BMW', 'Chrysler', 'Chrysler', 'Chrysler', 'Brokertec', 'DWAS', 'Ford', 'Buick'],
'Metric':['Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator'],
'Dimension':['Short', 'Short', 'Short', 'Long', 'Short', 'Short', 'Long', 'Long'],
'User': ['USA', 'USA', 'USA', 'USA', 'USA', 'New USA', 'USA', 'Los USA'],
'Value':[50, 3, 3, 2, 5, 7, 10, 5]
})
我想要的输出如下(将 Dimension
加 Manufacturer
):
Manufacturer Short Long
Chrysler 6 2
Buick 5 5
Mercedes 7 0
Ford 0 10
我需要稍微调整一下 Pandas 数据框(这就是我 运行 遇到麻烦的地方)。我的代码如下:
table_columns = ['Manufacturer', 'Longs', 'Shorts']
manufacturers = ['Chrysler', 'Buick', 'Mercedes', 'Ford']
df_new = (df[df['Manufacturer'].isin(manufacturers)]
.set_index(['Manufacturer', 'Dimension'])
['Value'].unstack()
.reset_index()[table_columns]
)
然后,使用 Plotly go.Table()
函数创建 table:
import plotly.graph_objects as go
direction_table = go.Figure(go.Table(
header=dict(
values=table_columns,
font=dict(size=12),
line_color='darkslategray',
fill_color='lightskyblue',
align='center'
),
cells=dict(
values=df_new.T, # using Transpose here
line_color='darkslategray',
fill_color='lightcyan',
align = 'center')
)
)
direction_table
我看到的错误是:
ValueError: Index contains duplicate entries, cannot reshape
解决此问题的最佳方法是什么?
提前致谢!
您需要将 pivot_table
与 aggfunc='sum' 一起使用,而不是 set_index.unstack
table_columns = ['Manufacturer', 'Long', 'Short']
manufacturers = ['Chrysler', 'Buick', 'Mercedes', 'Ford']
df_new = (test_df[test_df['Manufacturer'].isin(manufacturers)]
.pivot_table(index='Manufacturer', columns='Dimension',
values='Value', aggfunc='sum', fill_value=0)
.reset_index()
.rename_axis(columns=None)[table_columns]
)
print (df_new)
Manufacturer Long Short
0 Buick 5 0
1 Chrysler 2 6
2 Ford 10 0
请注意,这不是相同的输出,但我认为您的输入无法提供预期的输出
或 groupby.sum
和 unstack
相同的结果
(test_df[test_df['Manufacturer'].isin(manufacturers)]
.groupby(['Manufacturer', 'Dimension'])
['Value'].sum()
.unstack(fill_value=0)
.reset_index()
.rename_axis(columns=None)[table_columns]
)
使用 Plotly go.Table()
函数和 Pandas
,我试图创建一个 table 来汇总一些数据。我的数据如下:
import pandas as pd
test_df = pd.DataFrame({'Manufacturer':['BMW', 'Chrysler', 'Chrysler', 'Chrysler', 'Brokertec', 'DWAS', 'Ford', 'Buick'],
'Metric':['Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator', 'Indicator'],
'Dimension':['Short', 'Short', 'Short', 'Long', 'Short', 'Short', 'Long', 'Long'],
'User': ['USA', 'USA', 'USA', 'USA', 'USA', 'New USA', 'USA', 'Los USA'],
'Value':[50, 3, 3, 2, 5, 7, 10, 5]
})
我想要的输出如下(将 Dimension
加 Manufacturer
):
Manufacturer Short Long
Chrysler 6 2
Buick 5 5
Mercedes 7 0
Ford 0 10
我需要稍微调整一下 Pandas 数据框(这就是我 运行 遇到麻烦的地方)。我的代码如下:
table_columns = ['Manufacturer', 'Longs', 'Shorts']
manufacturers = ['Chrysler', 'Buick', 'Mercedes', 'Ford']
df_new = (df[df['Manufacturer'].isin(manufacturers)]
.set_index(['Manufacturer', 'Dimension'])
['Value'].unstack()
.reset_index()[table_columns]
)
然后,使用 Plotly go.Table()
函数创建 table:
import plotly.graph_objects as go
direction_table = go.Figure(go.Table(
header=dict(
values=table_columns,
font=dict(size=12),
line_color='darkslategray',
fill_color='lightskyblue',
align='center'
),
cells=dict(
values=df_new.T, # using Transpose here
line_color='darkslategray',
fill_color='lightcyan',
align = 'center')
)
)
direction_table
我看到的错误是:
ValueError: Index contains duplicate entries, cannot reshape
解决此问题的最佳方法是什么?
提前致谢!
您需要将 pivot_table
与 aggfunc='sum' 一起使用,而不是 set_index.unstack
table_columns = ['Manufacturer', 'Long', 'Short']
manufacturers = ['Chrysler', 'Buick', 'Mercedes', 'Ford']
df_new = (test_df[test_df['Manufacturer'].isin(manufacturers)]
.pivot_table(index='Manufacturer', columns='Dimension',
values='Value', aggfunc='sum', fill_value=0)
.reset_index()
.rename_axis(columns=None)[table_columns]
)
print (df_new)
Manufacturer Long Short
0 Buick 5 0
1 Chrysler 2 6
2 Ford 10 0
请注意,这不是相同的输出,但我认为您的输入无法提供预期的输出
或 groupby.sum
和 unstack
(test_df[test_df['Manufacturer'].isin(manufacturers)]
.groupby(['Manufacturer', 'Dimension'])
['Value'].sum()
.unstack(fill_value=0)
.reset_index()
.rename_axis(columns=None)[table_columns]
)