Seaborn 热图颜色条:如何确保 类 的正确顺序和正确显示的颜色

Seaborn heatmap colobar: how to assure the correct order of classes and correct colors displayed

我有一个数据框,其中包含某个计算的结果,我想将其绘制为带有颜色条的 seaborn 热图。我正在使用以下代码来实现这一点(主要取自此处:enter link description here):

# input data
results = [['equal','equal','smaller','smaller or equal','greater or equal'],   
           ['equal','equal','smaller','smaller','greater or equal'],                                      
           ['greater','equal','smaller or equal','smaller','smaller'],
           ['equal','smaller or equal','greater or equal','greater or equal','equal'],
           ['equal','equal','smaller','equal','equal']]

index = ['axc', 'org', 'cf5', 'cm1', 'ext']
columns = ['axc', 'org', 'cf5', 'cm1', 'ext']

# create a dataframe
res_df = pd.DataFrame(results, columns, index) 

value_to_int = {j:i for i,j in enumerate(['greater','greater or equal','equal','smaller or equal','smaller'])}

n = len(value_to_int)     

# discrete colormap (n samples from a given cmap)
cmap = sns.color_palette("viridis", n) 
ax = sns.heatmap(res_df.replace(value_to_int), cmap=cmap) 

# modify colorbar:
colorbar = ax.collections[0].colorbar 
r = colorbar.vmax - colorbar.vmin 
colorbar.set_ticks([colorbar.vmin + r / n * (0.5 + i) for i in range(n)])
colorbar.set_ticklabels(list(value_to_int.keys()))                                          
plt.show()

它大部分时间都像魅力一样工作,但如果索引列表中的 类 之一不存在,就会出现问题。为了演示,如果您像这样更改数据框:

results_changed = [['equal','equal','smaller','smaller or equal','greater or equal'],
              ['equal','equal','smaller','smaller','greater or equal'],
              ['greater or equal','equal','smaller or equal','smaller','smaller'],
              ['equal','smaller or equal','greater or equal','greater or equal','equal'],
              ['equal','equal','smaller','equal','equal']]

index = ['axc', 'org', 'cf5', 'cm1', 'ext']
columns = ['axc', 'org', 'cf5', 'cm1', 'ext']

# create a dataframe
res_df = pd.DataFrame(results_changed, columns, index) 

value_to_int = {j:i for i,j in enumerate(['greater','greater or equal','equal','smaller or equal','smaller'])}

n = len(value_to_int)  

# discrete colormap (n samples from a given cmap)
cmap = sns.color_palette("viridis", n) 
ax = sns.heatmap(res_df.replace(value_to_int), cmap=cmap) 

# modify colorbar:
colorbar = ax.collections[0].colorbar 
r = colorbar.vmax - colorbar.vmin 
colorbar.set_ticks([colorbar.vmin + r / n * (0.5 + i) for i in range(n)])
colorbar.set_ticklabels(list(value_to_int.keys()))                                          
plt.show()  

并继续绘图,生成的热图将为 类 分配错误的颜色——因为现在没有 'greater' 的情况,它将“移动”调色板并且 equal 不会像以前一样分配正确的颜色。

我尝试通过更改代码中的这一行来解决问题:

value_to_int = {j:i for i,j in enumerate(pd.unique(res_df.values.ravel()))}

虽然它解决了颜色分配问题,但它会产生另一个问题,因为颜色条会弄乱颜色的顺序(我想避免这种情况)。

有人可以建议如何解决这个问题吗?如果有任何建议,我将不胜感激。

确保在不同条件下具有可比性的最佳方法是始终将色条固定在同一水平:

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

results_changed = [['equal','equal','smaller','smaller or equal','greater or equal'],
              ['equal','equal','smaller','smaller','greater or equal'],
              ['greater or equal','equal','smaller or equal','smaller','smaller'],
              ['equal','smaller or equal','greater or equal','greater or equal','equal'],
              ['equal','equal','smaller','equal','equal']]

index = ['axc', 'org', 'cf5', 'cm1', 'ext']
columns = ['axc', 'org', 'cf5', 'cm1', 'ext']

# create a dataframe
res_df = pd.DataFrame(results_changed, columns, index) 

#construct dictionary from ordered list
category_order = ['greater', 'greater or equal', 'equal', 'smaller or equal', 'smaller']    
value_to_int = {j:i for i,j in enumerate(category_order)}    
n = len(value_to_int)  

# discrete colormap (n samples from a given cmap)
cmap = sns.color_palette("viridis", n) 
ax = sns.heatmap(res_df.replace(value_to_int), cmap=cmap, vmin=0, vmax=n) 

#modify colorbar:
colorbar = ax.collections[0].colorbar 
colorbar.set_ticks([0.5 + i for i in range(n)])
colorbar.set_ticklabels(category_order)                                          
plt.show()  

示例输出:

如果您只想在颜色条中显示实际存在的颜色,您可以预过滤现有类别的列表,但这会改变不同输入数组的配色方案,使它们难以比较。

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np

results_changed = [['equal','equal','smaller','smaller or equal','greater'],
              ['equal','equal','smaller','smaller','greater'],
              ['greater','equal','smaller','smaller','smaller'],
              ['equal','smaller','greater','greater','equal'],
              ['equal','equal','smaller','equal','equal']]

index = ['axc', 'org', 'cf5', 'cm1', 'ext']
columns = ['axc', 'org', 'cf5', 'cm1', 'ext']

# create a dataframe
res_df = pd.DataFrame(results_changed, columns, index) 

unique_results = np.unique(results_changed)
unique_categories = [cat for cat in ['greater','greater or equal','equal','smaller or equal','smaller'] if cat in unique_results]

value_to_int = {j:i for i,j in enumerate(unique_categories)}

n = len(value_to_int)  

# discrete colormap (n samples from a given cmap)
cmap = sns.color_palette("viridis", n) 
ax = sns.heatmap(res_df.replace(value_to_int), cmap=cmap) 

#modify colorbar:
colorbar = ax.collections[0].colorbar 
r = colorbar.vmax - colorbar.vmin 
colorbar.set_ticks([colorbar.vmin + r / n * (0.5 + i) for i in range(n)])
colorbar.set_ticklabels(unique_categories)
plt.show()  

示例输出: