用于数据分布的 ggplot 语法

Question

我试图绘制 beforeMinWageLaw 和 afterMinWageLaw 变量的数据分布，但是当我将它存储在 df 而不是 seattleData 中时，r 显示 "Error: Aesthetics must be either length 1 or the same as the data (43): x"。我怎样才能解决这个问题？另外，我如何绘制正态概率图来了解数据的正态性？谢谢

#Import Data
#seattleData <- read.table(file=file.choose(),
#                          header=T, sep=",",)

library(ggplot2)

#Define Variables
 food_drink_workers <- seattleData$food_drink_workers
 MinWage <- seattleData$washington_state_minwage
 afterMinWageLaw <- food_drink_workers[304:346]
 beforeMinWageLaw <- food_drink_workers[1:303]
 df <- data.frame(seattleData)

#Display Data Distribution with ggplot
 x <-ggplot(df, aes(x=food_drink_workers)) + 
  geom_histogram(mapping = aes(y = ..density..), color="black",     fill="white") +
  geom_density(alpha=.2, fill="blue")
  x + geom_vline(xintercept = c(108.8636), linetype = "dashed", color = "red") + 
    ggtitle("Distribtution of the Data") + xlab("Seattle MSA Food and Drink          Workers") + ylab("Density")

#Conduct Two Sample t-test
 options(scipen = 100)
 tTest <- t.test(beforeMinWageLaw, afterMinWageLaw, mu=0, alternative = "less",
                conf=.95, var.equal = F, paired = F)

您可以在这里下载数据：https://fred.stlouisfed.org/series/SMU53426607072200001SA

Screenshot

Answer 1

您收到此错误消息 "Error: Aesthetics must be either length 1 or the same as the data (43): x" 因为向量 afterMinWageLaw 的长度为 43 个值，而 beforeMinWageLaw 的长度为 303 个值，这就是为什么您不能在其中引用它们的原因一样的审美观aes()，我猜。

我会在一个图中使用不同的可视化效果，这样您就可以使用不同的数据长度或行数来设置不同的美学效果。首先，我会把你的数据分成两个数据框，一个在法律之前，一个在法律之后。使用 ggplot，您可以在一个图中引用不同的数据框，在您的例子中是这样的：

#set row indicex ranges for before and after law
row_index_range_before <- 1:303;
row_index_range_after <- 304:346;

#define two data frames
df_before <- data.frame(seattleData)[row_index_range_before, ];
df_after <- data.frame(seattleData)[row_index_range_after, ];

#display data distributions of both data frames with ggplot
x <- ggplot() + 
  geom_histogram(
    data = df_before
    ,mapping = aes(
      x = food_drink_workers
      ,y = ..density..
      ,color = "blue")
    ,fill = "white") +
  geom_histogram(
    data = df_after
    ,mapping = aes(
      x = food_drink_workers
      ,y = ..density..
      ,color = "red")
    ,fill = "white") +
  geom_density(
    data = df_before
    ,mapping = aes(
      x = food_drink_workers
      ,y = ..density..
      ,fill = "blue")
    ,alpha = .2) +
  geom_density(
    data = df_after
    ,mapping = aes(
      x = food_drink_workers
      ,y = ..density..
      ,fill = "red")
    ,alpha = .2) +
  scale_colour_manual(
    name = "Color"
    ,values = c("blue" = "blue", "red" = "red")
    ,labels = c("blue" = "Before Law", "red" = "After Law")) +
  scale_fill_manual(
    name = "Fill"
    ,values = c("blue" = "blue", "red" = "red")
    ,labels = c("blue" = "Before Law","red" = "After Law"));

x + geom_vline(
  xintercept = c(108.8636)
  ,linetype = "dashed"
  ,color = "red") + 
ggtitle("Distribtution of the Data") + 
  xlab("Seattle MSA Food and Drink          Workers") + 
  ylab("Density");

但是这样，您还可以将 afterMinWageLaw 和 beforeMinWageLaw 引用为 aes() 中的 x 并删除引用数据框的 data，我认为.

要同时绘制图例，您需要在 aes() 内设置 color 或 fill，并在您的绘图中添加 scale_colour_manual() 或 scale_fill_manual()。

用于数据分布的 ggplot 语法

ggplot syntax for data distribution

r

data-visualization

distribution

histogram

ggplot2