使用索引特定数据沿 clustermap 的特定轴添加条形图

Add bar-plot along a particular axis of clustermap with index specific data

我正在尝试为 seaborn 集群图中的每一行添加条形图(堆叠或其他方式)。

假设我有一个这样的数据框:

import pandas as pd
import numpy as np
import random

df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)), columns=["heatMap_1","heatMap_2","heatMap_3","heatMap_4","heatMap_5", "barPlot_1","barPlot_1","barPlot_1"])

df['index'] = [ random.randint(1,10000000)  for k in df.index]
df.set_index('index', inplace=True)
df.head()
       heatMap_1    heatMap_2   heatMap_3   heatMap_4   heatMap_5   barPlot_1   barPlot_1   barPlot_1
index                               
4552288 9   3   54  37  23  42  94  31
6915023 7   47  59  92  70  96  39  59
2988122 91  29  59  79  68  64  55  5
5060540 68  80  25  95  80  58  72  57
2901025 86  63  36  8   33  17  79  86

我可以使用前 5 列(在此示例中以前缀 heatmap_ 开头)创建 seaborn clustermap 使用此(或 seaborn 等价物):

sns.clustermap(df.iloc[:,0:5], )

和最后四列的堆叠条形图(在本例中以前缀 barPlot_ 开头)使用: df.iloc[:,5:8].plot(kind='bar', stacked=True)

但我对如何合并这两种绘图类型有点困惑。我知道 clustermap 创建了它自己的图形,我不确定我是否可以只从 clustermap 中提取热图,然后将它与子图一起使用。 (此处讨论:)。这会产生一个奇怪的输出。 编辑: 使用这个:

import pandas as pd
import numpy as np
import random
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
import matplotlib.gridspec


df = pd.DataFrame(np.random.randint(0,100,size=(100, 8)), columns=["heatMap_1","heatMap_2","heatMap_3","heatMap_4","heatMap_5", "barPlot_1","barPlot_2","barPlot_3"])
df['index'] = [ random.randint(1,10000000)  for k in df.index]
df.set_index('index', inplace=True)
g = sns.clustermap(df.iloc[:,0:5], )
g.gs.update(left=0.05, right=0.45)
gs2 = matplotlib.gridspec.GridSpec(1,1, left=0.6)
ax2 = g.fig.add_subplot(gs2[0])
df.iloc[:,5:8].plot(kind='barh', stacked=True, ax=ax2)

创建这个:

这并不完全匹配(即由于树状图存在偏移)。

另一种选择是手动执行聚类并创建 matplotlib 热图,然后添加相关的子图,如条形图(在此处讨论:How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

有没有一种方法可以将 clustermap 与其他图一起用作子图?

这就是我正在寻找的结果[1]

虽然不是一个正确的答案,但我决定将其分解并手动完成所有操作。 从答案 here 中获得灵感,我决定分别对热图进行聚类和重新排序:

def heatMapCluter(df):
    row_method = "ward"
    column_method = "ward"
    row_metric = "euclidean"
    column_metric = "euclidean"

    if column_method == "ward":
        d2 = dist.pdist(df.transpose())
        D2 = dist.squareform(d2)
        Y2 = sch.linkage(D2, method=column_method, metric=column_metric)
        Z2 = sch.dendrogram(Y2, no_plot=True)
        ind2 = sch.fcluster(Y2, 0.7 * max(Y2[:, 2]), "distance")
        idx2 = Z2["leaves"]
        df = df.iloc[:, idx2]
        ind2 = ind2[idx2]
    else:
        idx2 = range(df.shape[1])

    if row_method:
        d1 = dist.pdist(df)
        D1 = dist.squareform(d1)
        Y1 = sch.linkage(D1, method=row_method, metric=row_metric)
        Z1 = sch.dendrogram(Y1, orientation="right", no_plot=True)
        ind1 = sch.fcluster(Y1, 0.7 * max(Y1[:, 2]), "distance")
        idx1 = Z1["leaves"]
        df = df.iloc[idx1, :]
        ind1 = ind1[idx1]
    else:
        idx1 = range(df.shape[0])
    return df

重新排列了原始数据框:

clusteredHeatmap = heatMapCluter(df.iloc[:, 0:5].copy())
# Extract the "barplot" rows and merge them
clusteredDataframe = df.reindex(list(clusteredHeatmap.index.values))
clusteredDataframe = clusteredDataframe.reindex(
    list(clusteredHeatmap.columns.values)
    + list(df.iloc[:, 5:8].columns.values),
    axis=1,
)

然后使用 gridspec 绘制两个 "subfigures"(clustermap 和 barplot):

# Now let's plot this - first the heatmap and then the barplot.
# Since it is a "two" part plot which shares the same axis, it is
# better to use gridspec
fig = plt.figure(figsize=(12, 12))
gs = GridSpec(3, 3)
gs.update(wspace=0.015, hspace=0.05)
ax_main = plt.subplot(gs[0:3, :2])
ax_yDist = plt.subplot(gs[0:3, 2], sharey=ax_main)
im = ax_main.imshow(
    clusteredDataframe.iloc[:, 0:5],
    cmap="Greens",
    interpolation="nearest",
    aspect="auto",
)
clusteredDataframe.iloc[:, 5:8].plot(
    kind="barh", stacked=True, ax=ax_yDist, sharey=True
)

ax_yDist.spines["right"].set_color("none")
ax_yDist.spines["top"].set_color("none")
ax_yDist.spines["left"].set_visible(False)
ax_yDist.xaxis.set_ticks_position("bottom")


ax_yDist.set_xlim([0, 100])
ax_yDist.set_yticks([])
ax_yDist.xaxis.grid(False)
ax_yDist.yaxis.grid(False)

Jupyter 笔记本:https://gist.github.com/siddharthst/2a8b7028d18935860062ac7379b9279f

图片:

1 - http://code.activestate.com/recipes/578175-hierarchical-clustering-heatmap-python/