根据持续时间对 seaborn lmplot 的 y 轴进行排序

Sorting the y-axis of a seaborn lmplot based on duration

我有以下数据集摘录,我正试图用它来绘制 seaborn lmplot。

Case_ID Activity    Timestamp   Cum_Duration
0   1   a   2016-04-15 08:41:28 0.0
1   1   b   2016-04-18 12:55:01 3.0
2   1   d   2016-04-19 07:22:59 4.0
3   1   e   2016-04-23 15:06:58 8.0
4   1   f   2016-04-24 19:18:32 9.0
5   1   g   2016-04-25 14:56:42 10.0
6   1   h   2016-04-26 10:00:36 11.0
7   2   a   2016-04-18 20:40:14 0.0
8   2   b   2016-04-21 22:42:39 3.0
9   2   d   2016-04-24 01:29:27 5.0
10  2   g   2016-04-25 22:36:27 7.0
11  2   e   2016-04-27 16:12:28 9.0
12  2   f   2016-04-28 15:00:35 10.0
13  2   h   2016-05-01 18:32:18 13.0
14  3   a   2016-04-27 01:45:07 0.0
15  3   b   2016-04-27 21:50:32 1.0
16  3   d   2016-04-29 00:12:15 2.0
17  3   g   2016-04-29 16:24:46 3.0
18  3   e   2016-04-30 22:57:03 4.0
19  3   f   2016-05-02 01:33:30 5.0
20  3   h   2016-05-02 11:06:53 5.0
21  4   a   2016-05-02 08:38:34 0.0
22  4   b   2016-05-06 00:50:31 4.0
23  4   d   2016-05-06 17:56:11 4.0
24  4   g   2016-05-13 10:34:23 11.0
25  4   e   2016-05-13 13:56:10 11.0
26  4   f   2016-05-14 23:42:03 13.0
27  4   h   2016-05-17 14:02:28 15.0
28  5   a   2016-05-09 07:17:12 0.0
29  5   b   2016-05-10 06:29:42 1.0
30  5   c   2016-05-11 05:04:34 2.0

所以我使用以下代码绘制了下图。

sns.set_style('whitegrid')
sns.set_context('talk')
relactivity_plot = sns.lmplot(x='Cum_Duration',y='Case_ID', data=rdoa_plot, hue='Activity',height=10, aspect=1.5,fit_reg=False, scatter_kws={'s':150, 'alpha':1.0})
relactivity_plot.set(ylim=(max(rdoa_plot['Case_ID'])+1,0), yticks=(rdoa_plot['Case_ID']).unique(), xlim=(0, max(rdoa_plot['Cum_Duration'])+1))
relactivity_plot.fig.suptitle('Analyzing events timeline for the first 20 events')

Seaborn plot

但是,我想根据累积持续时间对 y 轴进行排序,这样时间最短的情况在顶部,持续时间较长的情况如下图所示。

Expected output

感谢您的帮助。

您可以将 'Case_ID' 列转换为字符串,然后通过 pandas groupby() 计算它们的顺序,并使用该顺序使 'Case_ID' 分类。

这是一些示例代码。 (我把 rdoa_plot 重命名为 rdoa_df 因为这个名字让我很困惑。我也直接使用了 scatterplot,因为 lmplot 在示例中似乎被简化为只有散点。 )

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from io import StringIO

data_str = '''Case_ID Activity    Timestamp   Cum_Duration
0   1   a   "2016-04-15 08:41:28" 0.0
1   1   b   "2016-04-18 12:55:01" 3.0
2   1   d   "2016-04-19 07:22:59" 4.0
3   1   e   "2016-04-23 15:06:58" 8.0
4   1   f   "2016-04-24 19:18:32" 9.0
5   1   g   "2016-04-25 14:56:42" 10.0
6   1   h   "2016-04-26 10:00:36" 11.0
7   2   a   "2016-04-18 20:40:14" 0.0
8   2   b   "2016-04-21 22:42:39" 3.0
9   2   d   "2016-04-24 01:29:27" 5.0
10  2   g   "2016-04-25 22:36:27" 7.0
11  2   e   "2016-04-27 16:12:28" 9.0
12  2   f   "2016-04-28 15:00:35" 10.0
13  2   h   "2016-05-01 18:32:18" 13.0
14  3   a   "2016-04-27 01:45:07" 0.0
15  3   b   "2016-04-27 21:50:32" 1.0
16  3   d   "2016-04-29 00:12:15" 2.0
17  3   g   "2016-04-29 16:24:46" 3.0
18  3   e   "2016-04-30 22:57:03" 4.0
19  3   f   "2016-05-02 01:33:30" 5.0
20  3   h   "2016-05-02 11:06:53" 5.0
21  4   a   "2016-05-02 08:38:34" 0.0
22  4   b   "2016-05-06 00:50:31" 4.0
23  4   d   "2016-05-06 17:56:11" 4.0
24  4   g   "2016-05-13 10:34:23" 11.0
25  4   e   "2016-05-13 13:56:10" 11.0
26  4   f   "2016-05-14 23:42:03" 13.0
27  4   h   "2016-05-17 14:02:28" 15.0
28  5   a   "2016-05-09 07:17:12" 0.0
29  5   b   "2016-05-10 06:29:42" 1.0
30  5   c   "2016-05-11 05:04:34" 2.0'''
rdoa_df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
rdoa_df['Case_ID'] = rdoa_df['Case_ID'].astype(str)
df_max_dur = rdoa_plot.groupby('Case_ID')['Cum_Duration'].max().sort_values()
case_id_order = df_max_dur.index.astype(str)
rdoa_df['Case_ID'] = pd.Categorical(rdoa_df['Case_ID'], categories=case_id_order)

sns.set_style('whitegrid')
sns.set_context('talk')
fig, ax = plt.subplots(figsize=(15, 10))
sns.scatterplot(x='Cum_Duration', y='Case_ID', data=rdoa_df, hue='Activity', s=500, alpha=1, ax=ax)
ax.set_xlim(-0.5, max(rdoa_df['Cum_Duration']) + 0.5)
ax.set_ylim(len(case_id_order) - 0.5, -0.5)
for s in ax.spines:
    ax.spines[s].set_visible(False)
plt.tight_layout()
plt.show()

要按字母顺序排列活动,您可以添加 hue_order=np.unique(rdoa_df['Activity'])