如何使用比例和数值在 R 中创建热图

Question

我有一个数据框，其中包含 tech/biotech 几个地区的国家帖子数量以及与其他地区一致的帖子数量。我希望创建一个热图，显示这些字段的交集（以发布数量计）以及这些字段的比例 "duplicates." 也就是说，数据框本身看起来类似于：

df <- data.frame(matrix(nrow=4, byrow=TRUE, data=c(14000, 3300, 
2500, 1000, 3300, 3300, 700, 300, 2500, 700, 95000,7500, 1000, 300, 7500, 108000)))

colnames(df) <- rownames(df) <- c("ML & Image", "Software Dev", "Cloud Dev", "Bioinformatics & Health")

因此，例如，第一行将从 ML & Image 职位发布的总数开始，然后是同时满足成为软件开发人员条件的 ML & Image 职位发布的数量，然后是满足成为 Cloud Developers 等条件的 ML & Image 招聘信息

如果您要在 R 控制台中查看 df table 并保持帖子的数值，但按比例着色，我想制作一个看起来有点像数据框的热图不同领域之间的重叠。因此，如果重叠很少，它将被着色为红色（左右），如果重叠大约为 30-60%，则为黄色（左右），如果重叠很多，则为绿色（左右），侧面有一个颜色条供参考。

非常感谢对此的任何帮助。谢谢！

Answer 1

不确定我是否完全理解你的问题，但以下内容可能会给你一些想法。

> library(ggplot2)
> library(reshape2)

# Setup the data                                                                                                                                                                                                                                                              

> df <- data.frame(matrix(nrow=4, byrow=TRUE, data=c(14000, 3300, 2500, 1000, 3300, 3300, 700, 300, 2500, 700, 95000,7500, 1000, 300, 7500, 108000)))
> colnames(df) <- rownames(df) <- c("ML & Image", "Software Dev", "Cloud Dev", "Bioinformatics & Health")

> df
                        ML & Image Software Dev Cloud Dev Bioinformatics & Health
ML & Image                   14000         3300      2500                    1000
Software Dev                  3300         3300       700                     300
Cloud Dev                     2500          700     95000                    7500
Bioinformatics & Health       1000          300      7500                  108000

# Convert df to matrix and divide each column by the diagonal value                                                                                                                                                                                                           

> m <- data.matrix(df)
> m <- m / matrix(t(colSums(diag(4) * m)), nrow=4, ncol=4, byrow=TRUE)

> m
                        ML & Image Software Dev   Cloud Dev Bioinformatics & Health
ML & Image              1.00000000   1.00000000 0.026315789             0.009259259
Software Dev            0.23571429   1.00000000 0.007368421             0.002777778
Cloud Dev               0.17857143   0.21212121 1.000000000             0.069444444
Bioinformatics & Health 0.07142857   0.09090909 0.078947368             1.000000000

# Prepare data for ggplot2 by melting the matrix data in long data and                                                                                                                                                                                                        
# add the posting counts back in to be used as labels                                                                                                                                                                                                                         

> hm <- melt(m)
> hm$postings <- c(df[,1],df[,2],df[,3],df[,4])

> hm
                      Var1                    Var2       value postings
1               ML & Image              ML & Image 1.000000000    14000
2             Software Dev              ML & Image 0.235714286     3300
3                Cloud Dev              ML & Image 0.178571429     2500
4  Bioinformatics & Health              ML & Image 0.071428571     1000
5               ML & Image            Software Dev 1.000000000     3300
6             Software Dev            Software Dev 1.000000000     3300
7                Cloud Dev            Software Dev 0.212121212      700
8  Bioinformatics & Health            Software Dev 0.090909091      300
9               ML & Image               Cloud Dev 0.026315789     2500
10            Software Dev               Cloud Dev 0.007368421      700
11               Cloud Dev               Cloud Dev 1.000000000    95000
12 Bioinformatics & Health               Cloud Dev 0.078947368     7500
13              ML & Image Bioinformatics & Health 0.009259259     1000
14            Software Dev Bioinformatics & Health 0.002777778      300
15               Cloud Dev Bioinformatics & Health 0.069444444     7500
16 Bioinformatics & Health Bioinformatics & Health 1.000000000   108000

# Plot it                                                                                                                                                                                                                                                                     

> ggplot(hm, aes(x=Var1, y=Var2)) +
        geom_tile(aes(fill=value)) +
        scale_fill_gradientn(colours=c("red","yellow","green")) +
        geom_text(aes(label=postings))

这导致：

如何使用比例和数值在 R 中创建热图

How to create heatmap in R with proportion and numeric value

r

heatmap