如何修复 Seaborn clustermap 矩阵?
How to fix Seaborn clustermap matrix?
我有一个三列的 csv 文件,我正在尝试将其转换为聚类热图。我的代码如下所示:
sum_mets = pd.read_csv('sum159_localization_met_magma.csv')
df5 = sum_mets[['Phenotype','Gene','P']]
clustermap5 = sns.clustermap(df5, cmap= 'inferno', figsize=(40, 40), pivot_kws={'index': 'Phenotype',
'columns' : 'Gene',
'values' : 'P'})
然后我收到这个 ValueError:
ValueError: The condensed distance matrix must contain only finite values.
对于上下文,我的所有值都是非零的。我不确定它无法处理哪些值。
预先感谢任何可以提供帮助的人。
虽然你没有NaN,但你需要检查你的观察是否完整,因为下面有一个枢轴,例如:
df = pd.DataFrame({'Phenotype':np.repeat(['very not cool','not cool','very cool','super cool'],4),
'Gene':["Gene"+str(i) for i in range(4)]*4,
'P':np.random.uniform(0,1,16)})
pd.pivot(df,columns="Gene",values="P",index="Phenotype")
Gene Gene0 Gene1 Gene2 Gene3
Phenotype
not cool 0.567653 0.984555 0.634450 0.406642
super cool 0.820595 0.072393 0.774895 0.185072
very cool 0.231772 0.448938 0.951706 0.893692
very not cool 0.227209 0.684660 0.013394 0.711890
上面的主元没有 NaN,并且绘制得很好:
sns.clustermap(df,figsize=(5, 5),pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
但是假设我们少了 1 个观察:
df1 = df[:15]
pd.pivot(df1,columns="Gene",values="P",index="Phenotype")
Gene Gene0 Gene1 Gene2 Gene3
Phenotype
not cool 0.106681 0.415873 0.480102 0.721195
super cool 0.961991 0.261710 0.329859 NaN
very cool 0.069925 0.718771 0.200431 0.196573
very not cool 0.631423 0.403604 0.043415 0.373299
如果您尝试调用 clusterheatmap,它会失败:
sns.clustermap(df1, pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
The condensed distance matrix must contain only finite values.
我建议检查缺失值是有意还是错误。因此,如果您确实有一些缺失值,您可以绕过聚类但预先计算链接并将其传递给函数,例如使用下面的相关性:
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
row_dism = 1 - df1.T.corr()
row_linkage = hc.linkage(sp.distance.squareform(row_dism), method='complete')
col_dism = 1 - df1.corr()
col_linkage = hc.linkage(sp.distance.squareform(col_dism), method='complete')
sns.clustermap(df1,figsize=(5, 5),row_linkage=row_linkage, col_linkage=col_linkage)
我有一个三列的 csv 文件,我正在尝试将其转换为聚类热图。我的代码如下所示:
sum_mets = pd.read_csv('sum159_localization_met_magma.csv')
df5 = sum_mets[['Phenotype','Gene','P']]
clustermap5 = sns.clustermap(df5, cmap= 'inferno', figsize=(40, 40), pivot_kws={'index': 'Phenotype',
'columns' : 'Gene',
'values' : 'P'})
然后我收到这个 ValueError:
ValueError: The condensed distance matrix must contain only finite values.
对于上下文,我的所有值都是非零的。我不确定它无法处理哪些值。 预先感谢任何可以提供帮助的人。
虽然你没有NaN,但你需要检查你的观察是否完整,因为下面有一个枢轴,例如:
df = pd.DataFrame({'Phenotype':np.repeat(['very not cool','not cool','very cool','super cool'],4),
'Gene':["Gene"+str(i) for i in range(4)]*4,
'P':np.random.uniform(0,1,16)})
pd.pivot(df,columns="Gene",values="P",index="Phenotype")
Gene Gene0 Gene1 Gene2 Gene3
Phenotype
not cool 0.567653 0.984555 0.634450 0.406642
super cool 0.820595 0.072393 0.774895 0.185072
very cool 0.231772 0.448938 0.951706 0.893692
very not cool 0.227209 0.684660 0.013394 0.711890
上面的主元没有 NaN,并且绘制得很好:
sns.clustermap(df,figsize=(5, 5),pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
但是假设我们少了 1 个观察:
df1 = df[:15]
pd.pivot(df1,columns="Gene",values="P",index="Phenotype")
Gene Gene0 Gene1 Gene2 Gene3
Phenotype
not cool 0.106681 0.415873 0.480102 0.721195
super cool 0.961991 0.261710 0.329859 NaN
very cool 0.069925 0.718771 0.200431 0.196573
very not cool 0.631423 0.403604 0.043415 0.373299
如果您尝试调用 clusterheatmap,它会失败:
sns.clustermap(df1, pivot_kws={'index': 'Phenotype','columns' : 'Gene','values' : 'P'})
The condensed distance matrix must contain only finite values.
我建议检查缺失值是有意还是错误。因此,如果您确实有一些缺失值,您可以绕过聚类但预先计算链接并将其传递给函数,例如使用下面的相关性:
import scipy.spatial as sp, scipy.cluster.hierarchy as hc
row_dism = 1 - df1.T.corr()
row_linkage = hc.linkage(sp.distance.squareform(row_dism), method='complete')
col_dism = 1 - df1.corr()
col_linkage = hc.linkage(sp.distance.squareform(col_dism), method='complete')
sns.clustermap(df1,figsize=(5, 5),row_linkage=row_linkage, col_linkage=col_linkage)