使用多个列标识符访问 R 中的数据

Question

我很难找到一种简单的方法来访问基于识别数据的多列的数据。

例如，我有多年的数据，这些图中有多个地块和多个样本，最后有一个参数 (x, y)。所以我的数据头看起来像这样：

年份、地块、样品 ID

我在每个级别中都有很多级别，我希望能够快速轻松地绘制这些值，例如 year=2015，plot=3 和 Sample ID= C。

我已经尝试过这样的代码，但是当我开始将它用于函数时，它变得笨拙，有没有更优雅的方法？

plot( x[Year=="2015" & Plot=="3" & Sample ID=="C"], 
      y[Year=="2015" & Plot=="3" & Sample ID=="C"]
)

Answer 1

那么，假设您的数据如下所示：

base_data = expand.grid(Year = 2013:2015,
                        Plot = 1:3,
                        SampleID = LETTERS[1:3],
                        ObsID = 1:4)
n = nrow(base_data)
base_data$x = runif(n)
base_data$y = rnorm(n)
head(base_data)
#   Year Plot SampleID ObsID         x           y
# 1 2013    1        A     1 0.5504904  0.64624816
# 2 2014    1        A     1 0.5337804  0.08473398
# 3 2015    1        A     1 0.9584508  0.31683347
# 4 2013    2        A     1 0.0854122  0.61898020
# 5 2014    2        A     1 0.8061409 -0.46255868
# 6 2015    2        A     1 0.8764612  0.24384120

下次提问时，您应该包含创建数据的代码或与 dput() 共享您自己的数据。这样就无需猜测数据的外观、类内容等

无论如何，为每个 Year、Plot 和 SampleID 绘制 y 与 x 的关系。我喜欢为此使用 dplyr：

library(dplyr)
# first we group our data
group_by(base_data, Year, Plot, SampleID) %>%
    # then we "do" a function that saves a plot
    # within this function, `.` is an abbreviation for
    # the piece of the data frame that is one group
    do({
       # create a filename. Within a group, there is only
       #     one value each for year, plot, and sample ID
       #     so I'll just use the first row in the filename
       group_name = paste0("y", .[1, "Year"],
                           "-p", .[1, "Plot"],
                           "-s", .[1, "SampleID"])
       # initialize the file
       png(filename = paste0(group_name, ".png"))
       # draw the plot
       plot(.[["x"]], .[["y"]])
       # close the file
       dev.off()
       # dplyr will be happy if we return a data frame
       return(.)
    })

这应该会为年份、绘图和样本 ID 的每个组合创建一个绘图并保存在您的工作目录中。我对代码进行了非常彻底的注释，但我将在下面再次重复它，不加注释，以强调它实际上是一组非常简短的命令：

group_by(base_data, Year, Plot, SampleID) %>%
    do({
       group_name = paste0("y", .[1, "Year"],
                           "-p", .[1, "Plot"],
                           "-s", .[1, "SampleID"])
       png(filename = paste0(group_name, ".png"))
       plot(.[["x"]], .[["y"]])
       dev.off()
       return(.)
    })

使用多个列标识符访问 R 中的数据

Accessing data in R using multiple column identifyers

r

multiple-columns