使用 colnames 的长格式

Question

假设我有以下数据

 A <- c(4,4,4,4,4)
 B <- c(1,2,3,4,4)
 C <- c(1,2,4,4,4)
 D <- c(3,2,4,1,4)
 E <- c(4,4,4,4,5)

data <- data.frame(A,B,C,D,E)
data<- t(data)
colnames(data) = c("num1","freq1","freq2","freq3","totfreq")

> data
  num1 freq1 freq2 freq3 totfreq
A    4     4     4     4       4
B    1     2     3     4       4
C    1     2     4     4       4
D    3     2     4     1       4
E    4     4     4     4       5

我正在尝试绘制分组条形图。两者的 x 轴应该是我的变量 A:E，y 是每个字母的 freq1、freq2、freq3 的值。我还需要保留按 totfreq.

中的值绘制变量 A:E 的能力

我知道我需要转换为长格式，但我在设置数据时遇到了问题。不知何故我需要 A, B, C, D, E 需要堆叠成一列，另一列堆叠 freq1, freq2、freq3、totfreq，然后是包含值的最后一列。有什么建议可以实现吗？

我希望最好在 plotly 中绘图，但 ggplot 也可以

Answer 1

首先，您有一个矩阵，但可能需要一个数据框。将其设为 tibble 会删除行名称，这是存储字母的位置，因此

as.data.frame(data) %>% rownames_to_column("id")

将为您提供一个包含 id 列字母的数据框。

您想通过收集所有 freq 列将此数据放入长格式。然后我添加了一个列，给出了观察的类型；这不是必需的，但由于您说您想轻松过滤两种类型中的一种——组 freq1 等，或 totfreq——这是我经常使用的方便设置。

library(tidyverse)

A <- c(4,4,4,4,4)
B <- c(1,2,3,4,4)
C <- c(1,2,4,4,4)
D <- c(3,2,4,1,4)
E <- c(4,4,4,4,5)

data <- data.frame(A,B,C,D,E)
data<- t(data)
colnames(data) = c("num1","freq1","freq2","freq3","totfreq")

data_long <- as.data.frame(data) %>%
  rownames_to_column("id") %>%
  gather(key = var, value = value, freq1:totfreq) %>%
  mutate(type = ifelse(var == "totfreq", "total", "by_group"))

head(data_long)
#>   id num1   var value     type
#> 1  A    4 freq1     4 by_group
#> 2  B    1 freq1     2 by_group
#> 3  C    1 freq1     2 by_group
#> 4  D    3 freq1     2 by_group
#> 5  E    4 freq1     4 by_group
#> 6  A    4 freq2     4 by_group

使用 type 列，按类型过滤绘图非常容易。这将使您可以将过滤后的数据框通过管道传输到 ggplot 之类的内容中，或者为您提供一个列以用于分面或映射到美学上。

# for grouped bar chart
data_long %>% filter(type == "by_group")
#>    id num1   var value     type
#> 1   A    4 freq1     4 by_group
#> 2   B    1 freq1     2 by_group
#> 3   C    1 freq1     2 by_group
#> 4   D    3 freq1     2 by_group
#> 5   E    4 freq1     4 by_group
#> 6   A    4 freq2     4 by_group
#> 7   B    1 freq2     3 by_group
#> 8   C    1 freq2     4 by_group
#> 9   D    3 freq2     4 by_group
#> 10  E    4 freq2     4 by_group
#> 11  A    4 freq3     4 by_group
#> 12  B    1 freq3     4 by_group
#> 13  C    1 freq3     4 by_group
#> 14  D    3 freq3     1 by_group
#> 15  E    4 freq3     4 by_group

# for total freqs
data_long %>% filter(type == "total")
#>   id num1     var value  type
#> 1  A    4 totfreq     4 total
#> 2  B    1 totfreq     4 total
#> 3  C    1 totfreq     4 total
#> 4  D    3 totfreq     4 total
#> 5  E    4 totfreq     5 total

由 reprex package (v0.2.0) 创建于 2018-05-17。

Answer 2

首先，您必须格式化数据以便对其进行处理，然后让 ggplot2 发挥作用。

找到下面的代码并输出图表：

library(dplyr)         #To use mutate
library(ggplot2)
library(reshape2)      #To use melt
library(plotly)
A <- c(4,4,4,4,4)
B <- c(1,2,3,4,4)
C <- c(1,2,4,4,4)
D <- c(3,2,4,1,4)
E <- c(4,4,4,4,5)

data <- data.frame(A,B,C,D,E)
data2=names(data)
data<- t(data)
colnames(data) = c("num1","freq1","freq2","freq3","totfreq")

data=data.frame(data)                      
#Because mutate only works for data.frame not matrix

data=mutate(data,names=data2)%>%select("freq1","freq2","freq3","freq3","totfreq","names")  
# Adding names and removing num1

meltdata=melt(data,id.vars="names")        
#Because we need melted data to perform 

#Graph 1 (colourless and boring)
Graph1=ggplot(meltdata,aes(x=names,y=value))+geom_col()+facet_wrap(~variable)
#Graph 2 (Cool one)
Graph2=ggplot(meltdata,aes(x=names,y=value,fill=variable))+geom_col()+geom_text(label=meltdata$value,position="stack")

#Graph 3 is the best I guess
meltdata=mutate(meltdata,xval=1)
Graph3=ggplot(meltdata,aes(x=xval,y=value,fill=variable))+geom_col()+geom_text(label=meltdata$value,position = position_stack(vjust = 0.5))+
  facet_grid(~names)+theme(panel.background = element_blank(),axis.text.x = element_blank(),
                           axis.ticks.x = element_blank())
Graph3
#If you like plotly so much then just use it by passing ggplot variable, But ggplot is better if you ask me
ggplotly(Graph1)
ggplotly(Graph2)
ggplotly(Graph3)

使用 colnames 的长格式

Long form using colnames

r

melt

tidyr