使用 R 函数创建多个数据集

Question

我创建了以下函数，它采用 3 个数字参数、经度大小（以度为单位）、纬度大小（以度为单位）和年份。该函数创建大小由前两个参数表示的正方形（网格），然后将数据集中的观测值分配到这些网格上，按年份（第三个参数）分隔。该功能按预期工作。

要使用该函数构建 2009 年的 2x2 组合（其中包含所有观测值的网格），我调用：

assemblage_2009 <- CreateAssembleage(2, 2, 2009)

但是，我想从 2009 年到 2018 年迭代创建组合。

我尝试用 i in 2009:2018 做一个 for 循环，但没有成功。我也试过 lapply 但也没有成功。

更有经验的 R 用户有什么想法吗？

函数：

CreateAssembleage <- function(size_long, size_lat, year){
  
  # create a dataset to hold only values with the chosen year
  data_grid_year <- dplyr::filter(data_grid, Year == year)

  
  # Create vectors to hold the columns (easier to work with)
  Longitude <- data_grid_year$Longitude 
  Latitude <- data_grid_year$Latitude
  
  dx <- size_long # set up the dimensions (easier to change here than inside the code)
  dy <- size_lat 
  
  # construct the grids
  gridx <- seq(min(Longitude), max(Longitude), by = dx) # the values we discussed for the big square
  gridy <- seq(min(Latitude), max(Latitude), by = dy)
  
  # take the data and create 3 new columns (x, y, cell) by finding the specified data inside the constructed grids
  grid_year <- data_grid_year %>% 
    mutate(
      x = findInterval(Longitude, gridx),
      y = findInterval(Latitude, gridy),
      cell = paste(x, y, sep = ",")) %>% 
    relocate(Sample_Id, Latitude, Longitude, x, y, cell) # bring forward the new columns

  ### Create the assemblage  
  data_temp <- grid_year %>% 
    group_by(cell) %>% # group by the same route id
    select(-c(Sample_Id, Latitude, Longitude, Midpoint_Date_Local,
              Year, Month, Chlorophyll_Index, x, y)) %>%  # remove unneeded columns
    summarise(across(everything(), sum)) # calculate the sum
  
  return(data_temp) #return the result
}

谢谢大家的想法。

Answer 1

我无法检查您的功能是否有效，因为我没有您的任何数据。也就是说，调用一个函数 n 次并保存输出有多种可能性。由于您没有指定问题，我不得不假设您很难运行循环中的函数并保存输出。另外，我必须假设第一个：你的函数有效，第二个：size_long 和 size_lat 总是设置为 2。如果你有不同的想法，你必须更清楚你的意思想要。

部分选项：

使用 lapply 创建一个包含输出的列表。请注意，在这里，您必须在定义函数时设置 size_long = 2, size_lat = 2，因此这些值是标准值。此外，将 year 作为第一个参数。

years <- 2009:2018
results <- lapply(years, CreateAssembleage)

使用 for 循环创建一个包含输出的列表：

results <- list()
for(i in 2009:2018){
list[[paste0("assemblage_", i)]] <- CreateAssembleage(size_long = 2, size_lat = 2, year = i)
}

如果需要，创建多个变量，每年一个：

for(i in 2009:2018){
do.call("<-", list(paste0("assemblage_", i), CreateAssembleage(size_long = 2, size_lat = 2,
        year = i)))
}

与 3. 相同，但使用 assign:

for(i in 2009:2018){
assign(paste0("assemblage_", i), CreateAssembleage(size_long = 2, size_lat = 2, year = i))
}

请注意，如果您每次不仅要更改 year，还要更改其他变量，例如每次迭代更改 size_lat，则必须使用 mapply而不是 lapply，或者，在循环的情况下，您还必须使用其他变量创建向量（或数据框）并调整循环。

编辑：根据 MrFlick 的建议，我更改了选项的顺序并添加了 assign 选项。对于大多数初学者来说，循环更容易理解，但对于大型数据集来说，它们可能会慢得令人讨厌。所以最好习惯 lapply.

使用 R 函数创建多个数据集

Creating multiple datasets with R function

parameters

r

function

lapply