ggplot2 的子集数据

Question

我将数据保存在多个数据集中，每个数据集中包含四个变量。想象一下像 data.table dt 这样的东西，它由变量 Country、Male/Female、Birthyear、Weighted Average Income 组成。我想创建一个图表，其中您只能看到一个国家/地区按出生年份加权平均收入并按 male/female 划分。我已使用 facet_grid() 函数获取所有国家/地区的图表网格，如下所示。

ggplot() + 
 geom_line(data = dt,
           aes(x = Birthyear, 
               y = Weighted Average Income,
               colour = 'Weighted Average Income'))+
 facet_grid(Country ~ Male/Female)

不过，我尝试过只针对一个国家/地区隔离图表，但下面的代码似乎不起作用。如何正确地对数据进行子集化？

ggplot() + 
 geom_line(data = dt[Country == 'Germany'],
           aes(x = Birthyear, 
               y = Weighted Average Income,
               colour = 'Weighted Average Income'))+
 facet_grid(Country ~ Male/Female)

Answer 1

对于您的具体情况，问题是您没有引用 Male/Female 和 Weighted Average Income。此外，您的数据和基本美学应该属于 ggplot 而不是 geom_line。这样做会将它们隔离到单层，如果要添加例如 geom_smooth.

，则必须将代码添加到情节的每一层

所以要解决您的问题，您可以这样做

library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'], 
       aes(x = Birthyear, 
           y = sym("Weighted Average Income"),
           col = sym("Weighted Average Income")
       ) + #Could use "`x`" instead of sym(x) 
  geom_line() + 
  facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot

现在 ggplot2 实际上有一个（鲜为人知的）内置功能来更改您的数据，所以如果您想将此与包含您所有国家/地区的绘图进行比较，您可以这样做：

plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")

ggplot2 的子集数据

Subsetting data for ggplot2

r

subset

ggplot2