根据持续时间对 seaborn lmplot 的 y 轴进行排序
Sorting the y-axis of a seaborn lmplot based on duration
我有以下数据集摘录,我正试图用它来绘制 seaborn lmplot。
Case_ID Activity Timestamp Cum_Duration
0 1 a 2016-04-15 08:41:28 0.0
1 1 b 2016-04-18 12:55:01 3.0
2 1 d 2016-04-19 07:22:59 4.0
3 1 e 2016-04-23 15:06:58 8.0
4 1 f 2016-04-24 19:18:32 9.0
5 1 g 2016-04-25 14:56:42 10.0
6 1 h 2016-04-26 10:00:36 11.0
7 2 a 2016-04-18 20:40:14 0.0
8 2 b 2016-04-21 22:42:39 3.0
9 2 d 2016-04-24 01:29:27 5.0
10 2 g 2016-04-25 22:36:27 7.0
11 2 e 2016-04-27 16:12:28 9.0
12 2 f 2016-04-28 15:00:35 10.0
13 2 h 2016-05-01 18:32:18 13.0
14 3 a 2016-04-27 01:45:07 0.0
15 3 b 2016-04-27 21:50:32 1.0
16 3 d 2016-04-29 00:12:15 2.0
17 3 g 2016-04-29 16:24:46 3.0
18 3 e 2016-04-30 22:57:03 4.0
19 3 f 2016-05-02 01:33:30 5.0
20 3 h 2016-05-02 11:06:53 5.0
21 4 a 2016-05-02 08:38:34 0.0
22 4 b 2016-05-06 00:50:31 4.0
23 4 d 2016-05-06 17:56:11 4.0
24 4 g 2016-05-13 10:34:23 11.0
25 4 e 2016-05-13 13:56:10 11.0
26 4 f 2016-05-14 23:42:03 13.0
27 4 h 2016-05-17 14:02:28 15.0
28 5 a 2016-05-09 07:17:12 0.0
29 5 b 2016-05-10 06:29:42 1.0
30 5 c 2016-05-11 05:04:34 2.0
所以我使用以下代码绘制了下图。
sns.set_style('whitegrid')
sns.set_context('talk')
relactivity_plot = sns.lmplot(x='Cum_Duration',y='Case_ID', data=rdoa_plot, hue='Activity',height=10, aspect=1.5,fit_reg=False, scatter_kws={'s':150, 'alpha':1.0})
relactivity_plot.set(ylim=(max(rdoa_plot['Case_ID'])+1,0), yticks=(rdoa_plot['Case_ID']).unique(), xlim=(0, max(rdoa_plot['Cum_Duration'])+1))
relactivity_plot.fig.suptitle('Analyzing events timeline for the first 20 events')
Seaborn plot
但是,我想根据累积持续时间对 y 轴进行排序,这样时间最短的情况在顶部,持续时间较长的情况如下图所示。
Expected output
感谢您的帮助。
您可以将 'Case_ID' 列转换为字符串,然后通过 pandas groupby()
计算它们的顺序,并使用该顺序使 'Case_ID' 分类。
这是一些示例代码。 (我把 rdoa_plot
重命名为 rdoa_df
因为这个名字让我很困惑。我也直接使用了 scatterplot
,因为 lmplot
在示例中似乎被简化为只有散点。 )
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from io import StringIO
data_str = '''Case_ID Activity Timestamp Cum_Duration
0 1 a "2016-04-15 08:41:28" 0.0
1 1 b "2016-04-18 12:55:01" 3.0
2 1 d "2016-04-19 07:22:59" 4.0
3 1 e "2016-04-23 15:06:58" 8.0
4 1 f "2016-04-24 19:18:32" 9.0
5 1 g "2016-04-25 14:56:42" 10.0
6 1 h "2016-04-26 10:00:36" 11.0
7 2 a "2016-04-18 20:40:14" 0.0
8 2 b "2016-04-21 22:42:39" 3.0
9 2 d "2016-04-24 01:29:27" 5.0
10 2 g "2016-04-25 22:36:27" 7.0
11 2 e "2016-04-27 16:12:28" 9.0
12 2 f "2016-04-28 15:00:35" 10.0
13 2 h "2016-05-01 18:32:18" 13.0
14 3 a "2016-04-27 01:45:07" 0.0
15 3 b "2016-04-27 21:50:32" 1.0
16 3 d "2016-04-29 00:12:15" 2.0
17 3 g "2016-04-29 16:24:46" 3.0
18 3 e "2016-04-30 22:57:03" 4.0
19 3 f "2016-05-02 01:33:30" 5.0
20 3 h "2016-05-02 11:06:53" 5.0
21 4 a "2016-05-02 08:38:34" 0.0
22 4 b "2016-05-06 00:50:31" 4.0
23 4 d "2016-05-06 17:56:11" 4.0
24 4 g "2016-05-13 10:34:23" 11.0
25 4 e "2016-05-13 13:56:10" 11.0
26 4 f "2016-05-14 23:42:03" 13.0
27 4 h "2016-05-17 14:02:28" 15.0
28 5 a "2016-05-09 07:17:12" 0.0
29 5 b "2016-05-10 06:29:42" 1.0
30 5 c "2016-05-11 05:04:34" 2.0'''
rdoa_df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
rdoa_df['Case_ID'] = rdoa_df['Case_ID'].astype(str)
df_max_dur = rdoa_plot.groupby('Case_ID')['Cum_Duration'].max().sort_values()
case_id_order = df_max_dur.index.astype(str)
rdoa_df['Case_ID'] = pd.Categorical(rdoa_df['Case_ID'], categories=case_id_order)
sns.set_style('whitegrid')
sns.set_context('talk')
fig, ax = plt.subplots(figsize=(15, 10))
sns.scatterplot(x='Cum_Duration', y='Case_ID', data=rdoa_df, hue='Activity', s=500, alpha=1, ax=ax)
ax.set_xlim(-0.5, max(rdoa_df['Cum_Duration']) + 0.5)
ax.set_ylim(len(case_id_order) - 0.5, -0.5)
for s in ax.spines:
ax.spines[s].set_visible(False)
plt.tight_layout()
plt.show()
要按字母顺序排列活动,您可以添加 hue_order=np.unique(rdoa_df['Activity'])
。
我有以下数据集摘录,我正试图用它来绘制 seaborn lmplot。
Case_ID Activity Timestamp Cum_Duration
0 1 a 2016-04-15 08:41:28 0.0
1 1 b 2016-04-18 12:55:01 3.0
2 1 d 2016-04-19 07:22:59 4.0
3 1 e 2016-04-23 15:06:58 8.0
4 1 f 2016-04-24 19:18:32 9.0
5 1 g 2016-04-25 14:56:42 10.0
6 1 h 2016-04-26 10:00:36 11.0
7 2 a 2016-04-18 20:40:14 0.0
8 2 b 2016-04-21 22:42:39 3.0
9 2 d 2016-04-24 01:29:27 5.0
10 2 g 2016-04-25 22:36:27 7.0
11 2 e 2016-04-27 16:12:28 9.0
12 2 f 2016-04-28 15:00:35 10.0
13 2 h 2016-05-01 18:32:18 13.0
14 3 a 2016-04-27 01:45:07 0.0
15 3 b 2016-04-27 21:50:32 1.0
16 3 d 2016-04-29 00:12:15 2.0
17 3 g 2016-04-29 16:24:46 3.0
18 3 e 2016-04-30 22:57:03 4.0
19 3 f 2016-05-02 01:33:30 5.0
20 3 h 2016-05-02 11:06:53 5.0
21 4 a 2016-05-02 08:38:34 0.0
22 4 b 2016-05-06 00:50:31 4.0
23 4 d 2016-05-06 17:56:11 4.0
24 4 g 2016-05-13 10:34:23 11.0
25 4 e 2016-05-13 13:56:10 11.0
26 4 f 2016-05-14 23:42:03 13.0
27 4 h 2016-05-17 14:02:28 15.0
28 5 a 2016-05-09 07:17:12 0.0
29 5 b 2016-05-10 06:29:42 1.0
30 5 c 2016-05-11 05:04:34 2.0
所以我使用以下代码绘制了下图。
sns.set_style('whitegrid')
sns.set_context('talk')
relactivity_plot = sns.lmplot(x='Cum_Duration',y='Case_ID', data=rdoa_plot, hue='Activity',height=10, aspect=1.5,fit_reg=False, scatter_kws={'s':150, 'alpha':1.0})
relactivity_plot.set(ylim=(max(rdoa_plot['Case_ID'])+1,0), yticks=(rdoa_plot['Case_ID']).unique(), xlim=(0, max(rdoa_plot['Cum_Duration'])+1))
relactivity_plot.fig.suptitle('Analyzing events timeline for the first 20 events')
Seaborn plot
但是,我想根据累积持续时间对 y 轴进行排序,这样时间最短的情况在顶部,持续时间较长的情况如下图所示。
Expected output
感谢您的帮助。
您可以将 'Case_ID' 列转换为字符串,然后通过 pandas groupby()
计算它们的顺序,并使用该顺序使 'Case_ID' 分类。
这是一些示例代码。 (我把 rdoa_plot
重命名为 rdoa_df
因为这个名字让我很困惑。我也直接使用了 scatterplot
,因为 lmplot
在示例中似乎被简化为只有散点。 )
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from io import StringIO
data_str = '''Case_ID Activity Timestamp Cum_Duration
0 1 a "2016-04-15 08:41:28" 0.0
1 1 b "2016-04-18 12:55:01" 3.0
2 1 d "2016-04-19 07:22:59" 4.0
3 1 e "2016-04-23 15:06:58" 8.0
4 1 f "2016-04-24 19:18:32" 9.0
5 1 g "2016-04-25 14:56:42" 10.0
6 1 h "2016-04-26 10:00:36" 11.0
7 2 a "2016-04-18 20:40:14" 0.0
8 2 b "2016-04-21 22:42:39" 3.0
9 2 d "2016-04-24 01:29:27" 5.0
10 2 g "2016-04-25 22:36:27" 7.0
11 2 e "2016-04-27 16:12:28" 9.0
12 2 f "2016-04-28 15:00:35" 10.0
13 2 h "2016-05-01 18:32:18" 13.0
14 3 a "2016-04-27 01:45:07" 0.0
15 3 b "2016-04-27 21:50:32" 1.0
16 3 d "2016-04-29 00:12:15" 2.0
17 3 g "2016-04-29 16:24:46" 3.0
18 3 e "2016-04-30 22:57:03" 4.0
19 3 f "2016-05-02 01:33:30" 5.0
20 3 h "2016-05-02 11:06:53" 5.0
21 4 a "2016-05-02 08:38:34" 0.0
22 4 b "2016-05-06 00:50:31" 4.0
23 4 d "2016-05-06 17:56:11" 4.0
24 4 g "2016-05-13 10:34:23" 11.0
25 4 e "2016-05-13 13:56:10" 11.0
26 4 f "2016-05-14 23:42:03" 13.0
27 4 h "2016-05-17 14:02:28" 15.0
28 5 a "2016-05-09 07:17:12" 0.0
29 5 b "2016-05-10 06:29:42" 1.0
30 5 c "2016-05-11 05:04:34" 2.0'''
rdoa_df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
rdoa_df['Case_ID'] = rdoa_df['Case_ID'].astype(str)
df_max_dur = rdoa_plot.groupby('Case_ID')['Cum_Duration'].max().sort_values()
case_id_order = df_max_dur.index.astype(str)
rdoa_df['Case_ID'] = pd.Categorical(rdoa_df['Case_ID'], categories=case_id_order)
sns.set_style('whitegrid')
sns.set_context('talk')
fig, ax = plt.subplots(figsize=(15, 10))
sns.scatterplot(x='Cum_Duration', y='Case_ID', data=rdoa_df, hue='Activity', s=500, alpha=1, ax=ax)
ax.set_xlim(-0.5, max(rdoa_df['Cum_Duration']) + 0.5)
ax.set_ylim(len(case_id_order) - 0.5, -0.5)
for s in ax.spines:
ax.spines[s].set_visible(False)
plt.tight_layout()
plt.show()
要按字母顺序排列活动,您可以添加 hue_order=np.unique(rdoa_df['Activity'])
。