如何合并多个数据框并将它们显示在 python 的一个箱线图中?
How to merge multi dataframe and show them in one boxplot in python?
我正在使用二进制 classification 数据集,我正在尝试绘制所有样本的年龄,其中 class == 1 的样本和 class == 0 的样本?
我想知道如何合并 firstDf、secondDf 和 thirdDf 并将它们显示在 python?
的一个箱线图中
age | class
------------
1 | 1
2 | 1
3 | 0
4 | 1
5 | 0
6 | 1
7 | 1
8 | 0
9 | 0
10 | 1
import pandas as pd
import matplotlib.pyplot as plt
data = [['age', 'class'],
[1,1],
[2,1],
[3,0],
[4,1],
[5,0],
[6,1],
[7,1],
[8,0],
[9,0],
[10,1]]
firstDf = df['age']
secondDf = [df[df['class'] == 0]['age']]
thirdDf = [df[df['class'] == 1]['age']]
预期剧情
# subset dataframes
firstDf = df
secondDf = df[df['class'] == 0]
thirdDf = df[df['class'] == 1]
# combine dataframes and reset index
combined_df = pd.concat([firstDf, secondDf, thirdDf],
keys=['All', 'Class0', 'Class1']).reset_index(level=0)
# drop column 'class'
combined_df = combined_df.drop('class', axis=1)
# rename columns
combined_df.columns = ['category', 'age']
# fix datatype
combined_df['age'] = combined_df['age'].astype('int')
# import seaborn
import seaborn as sns
# plot boxplot
sns.boxplot(data=combined_df, x='category', y='age')
我正在使用二进制 classification 数据集,我正在尝试绘制所有样本的年龄,其中 class == 1 的样本和 class == 0 的样本? 我想知道如何合并 firstDf、secondDf 和 thirdDf 并将它们显示在 python?
的一个箱线图中age | class
------------
1 | 1
2 | 1
3 | 0
4 | 1
5 | 0
6 | 1
7 | 1
8 | 0
9 | 0
10 | 1
import pandas as pd
import matplotlib.pyplot as plt
data = [['age', 'class'],
[1,1],
[2,1],
[3,0],
[4,1],
[5,0],
[6,1],
[7,1],
[8,0],
[9,0],
[10,1]]
firstDf = df['age']
secondDf = [df[df['class'] == 0]['age']]
thirdDf = [df[df['class'] == 1]['age']]
预期剧情
# subset dataframes
firstDf = df
secondDf = df[df['class'] == 0]
thirdDf = df[df['class'] == 1]
# combine dataframes and reset index
combined_df = pd.concat([firstDf, secondDf, thirdDf],
keys=['All', 'Class0', 'Class1']).reset_index(level=0)
# drop column 'class'
combined_df = combined_df.drop('class', axis=1)
# rename columns
combined_df.columns = ['category', 'age']
# fix datatype
combined_df['age'] = combined_df['age'].astype('int')
# import seaborn
import seaborn as sns
# plot boxplot
sns.boxplot(data=combined_df, x='category', y='age')