在 R 中重新格式化条形图

Question

对于一项作业，我需要可视化公司的市场价值，这些公司被分成几组来表示行业。我创建了以下图表：Market Value of Equity graph，但是在学术文章中图表不允许使用这种颜色。使用代码如下：

ggplot(data = g, aes(x=g$MarketCap, group = g$SIC, fill=SIC)) +
  geom_histogram(position = "dodge", binwidth = 1000) + theme_bw() + xlim(0,5000) +
  labs(x = "Market Value (in Millions $)", title = "Market Value per Industry")

我试图找到一种替代方法来显示它，但我一无所获。另一种方法是将所有条形的颜色更改为灰色，但它们变得无法区分。谁知道如何解决这个问题？非常感谢..

Answer 1

Patubd，发生了很多事情，恐怕这些评论不足以让你继续前进。因此，我想在这里指出一些事情。

您没有提供可重现的示例。因此，我预先“模拟”了一些数据。您可以根据自己的喜好进行调整。

在您的 ggplot() 调用中，您引用了 g 数据框。然后就不需要使用明确的 g$variable 符号。

你在你的 MeanMarketCap 管道中做同样的事情。我想这是你面临的部分问题。

数据：

library(dplyr)
set.seed(666)   # set seed for random generator
# ------------------- data frame with 60 examples of industry group SIC and MarketCap
df <- data.frame(
   SIC        = rep(c("0","1","2"), 20)
  , MarketCap = c(rep(50, 30), rep(1000, 15), rep(2000, 10), rep(3000, 5))
)
# ------------------- add 15 random picks to make it less homogenuous
df <- df %>% 
   bind_rows(df %>% sample_n(15))

(I)“色彩较少”and/or 刻面

fig1 <- ggplot(data = df, aes(x=MarketCap, group = SIC, fill=SIC)) +
    geom_histogram(position = "dodge") + 
#------------- as proposed to make graph less colourful / shades of grey ---------
    scale_fill_grey() + 
#---------------------------------------------------------------------------------
    theme_bw() + xlim(0,5000) +
    labs(x = "Market Value (in Millions $)", title = "Market Value per Industry")


# make a 2nd plot by facetting above
# If the plot is stored in an object, i.e. fig1, you do not have to "repeat" the code
# and just add the facet-layer
fig2 <- fig1 + facet_grid(. ~ SIC)

library(patchwork)   # cool package to combine plots
fig1 / fig2          # puts one plot above the other

通过一个方面，您可以拆分组。这支持并排分析......并且组的着色不太重要，因为它现在是分面的一部分。但您可以将两者结合起来，如图所示。

(II) 汇总平均值

如果您不使用 df$variable 符号，您的代码将有效。这打破了 group-by 调用，你引用了完整的数据框。

df %>% 
   group_by(SIC) %>% 
   summarise(MeanMarketCap = mean(MarketCap))

这产生了 - 简单的模拟 - 数据：

# A tibble: 3 x 2
  SIC   MeanMarketCap
  <chr>         <dbl>
1 0              858.
2 1              876.
3 2              858.

要显示分布，可以使用箱线图。箱线图使用四分位数分布（第 25-75 个百分位数和中位数 [第 50 个百分位数]。
您可以为此使用 geom_boxplot()。 ggplot 将负责统计计算。

df %>%
   ggplot() +
   geom_boxplot(aes(x = SIC, y = MarketCap)

有了你的数据（更多不同的数据点），情节看起来会更令人印象深刻。但是您已经可以清楚地看到示例行业中位数的差异，SIC。

如果您觉得可以使用 geom_jitter() 添加数据点。

希望这能让您入门。祝你好运！

在 R 中重新格式化条形图

Reformatting bar graph in R

r

data-visualization

bar-chart

ggplot2