填写数据框中缺失日期的缺失值
Fill in missing values for missing dates in dataframe
我有以下数据框:
df = pd.DataFrame(
{
'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]
}
)
status month counts
0 open January 2020 10
1 closed January 2020 12
2 open February 2020 32
3 closed February 2020 12
4 open April 2020 19
5 closed April 2020 40
6 open August 2020 10
7 closed August 2020 11
我正在尝试使用 seaborn 获取堆积条形图:
sns.histplot(df, x='month', weights='counts', hue='status', multiple='stack')
目的是得到一个时间序列连续且不漏月的图。我如何用值填充缺失的行,以便数据框如下所示?
status month counts
open January 2020 10
closed January 2020 12
open February 2020 32
closed February 2020 12
open March 2020 0
closed March 2020 0
open April 2020 19
closed April 2020 40
open May 2020 0
closed May 2020 0
open June 2020 0
closed June 2020 0
open July 2020 0
closed July 2020 0
open August 2020 10
closed August 2020 11
您可以旋转数据框,然后用所需的月份重新编制索引。
import pandas as pd
df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]})
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
df_pivoted = df.pivot(values='counts', index='month', columns='status').reindex(months).fillna(0)
ax = df_pivoted.plot.bar(stacked=True, width=1, ec='black', rot=0, figsize=(12, 5))
一个 seaborn 解决方案,可以使用 order=
。这不适用于 histplot
,仅适用于 barplot
,它不会堆叠条形图。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]})
plt.figure(figsize=(12, 5))
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
ax = sns.barplot(data=df, x='month', y='counts', hue='status', order=months)
plt.tight_layout()
plt.show()
我有以下数据框:
df = pd.DataFrame(
{
'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]
}
)
status month counts
0 open January 2020 10
1 closed January 2020 12
2 open February 2020 32
3 closed February 2020 12
4 open April 2020 19
5 closed April 2020 40
6 open August 2020 10
7 closed August 2020 11
我正在尝试使用 seaborn 获取堆积条形图:
sns.histplot(df, x='month', weights='counts', hue='status', multiple='stack')
目的是得到一个时间序列连续且不漏月的图。我如何用值填充缺失的行,以便数据框如下所示?
status month counts
open January 2020 10
closed January 2020 12
open February 2020 32
closed February 2020 12
open March 2020 0
closed March 2020 0
open April 2020 19
closed April 2020 40
open May 2020 0
closed May 2020 0
open June 2020 0
closed June 2020 0
open July 2020 0
closed July 2020 0
open August 2020 10
closed August 2020 11
您可以旋转数据框,然后用所需的月份重新编制索引。
import pandas as pd
df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]})
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
df_pivoted = df.pivot(values='counts', index='month', columns='status').reindex(months).fillna(0)
ax = df_pivoted.plot.bar(stacked=True, width=1, ec='black', rot=0, figsize=(12, 5))
一个 seaborn 解决方案,可以使用 order=
。这不适用于 histplot
,仅适用于 barplot
,它不会堆叠条形图。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'status': ['open', 'closed', 'open', 'closed', 'open', 'closed', 'open', 'closed'],
'month': ['January 2020', 'January 2020', 'February 2020', 'February 2020', 'April 2020', 'April 2020', 'August 2020', 'August 2020'],
'counts': [10, 12, 32, 12, 19, 40, 10, 11]})
plt.figure(figsize=(12, 5))
months = [f'{m} 2020' for m in ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August']]
ax = sns.barplot(data=df, x='month', y='counts', hue='status', order=months)
plt.tight_layout()
plt.show()