当前的 r 代码需要很长时间才能从 worldclim 中提取气候数据,如何使其更快

current r codes take longtime to extract climate data from worldlcim, how to make this faster

目前我的代码从 [worldclim][1] 数据集中提取气候变量的时间太长。我想从 link 下载气候数据并找到我的物种分布多边形上的最高温度并在目录中另存为 CSV 文件。

代码有效,但花费的时间太长(例如,在我的 PC 上需要 3-4 天)。有人可以建议如何提高我的代码的性能吗?

我的代码在这里:

# download the climate dataset and unzip. I can download and unzip this into my pc. Please suggest me on main codes for improvement
download.file("http://biogeo.ucdavis.edu/data/climate/cmip5/30s/mi85tx50.zip", destfile = "E://ClimateDataOutputs//MIROC-ESM-CHEM_rcp85TX", mode="wb")
unzip("E://ClimateDataOutputs//MIROC-ESM-CHEM_rcp85TX")
# Codes for improvement
# load required packages
require(sp)
require(rgdal)
require(raster)
require(lsr)
require(maptools)
# For Bioclim - Need to project species polygons
projection <- CRS ("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
polygons <- readShapePoly("F:\9. Other projects\All projected maps\AllP.shp", proj4string = projection)
polygons$BINOMIAL <- as.character(polygons$BINOMIAL)
names=c(polygons$BINOMIAL)
stats_out<- data.frame(matrix(NA, ncol = 4, nrow = 579))
colnames(stats_out)<-c("BINOMIAL", "AAD", "mean", "obs")
stats_out[,1]<-names
# iterate over species polygons


for (i in 1:579) {
    poly<-polygons[i,]
    print(poly$BINOMIAL)
    data_out<-data.frame(matrix(NA, ncol = 1))
    colnames(data_out)<-c("MaxTemp2050rcp85_MIROC_ESM_CHEM")` 



    for (j in 1:12) {
        filename<-c(paste("E:\ClimateDataOutputs\mi85tx50",j,".tif", sep=""))
        ##print(filename)
        grid<-raster(filename)
        ##plot(grid)
        ##plot(poly, add=TRUE)
        data<-extract(grid, poly)
        data1<-as.data.frame(data)
        colnames(data1)<-c("MaxTemp2050rcp85_MIROC_ESM_CHEM")
        data_out= rbind(data_out,data1)
        }



    M<-mean(data_out$MaxTemp2050rcp85_MIROC_ESM_CHEM, na.rm=TRUE)
    AAD<-aad(data_out$MaxTemp2050rcp85_MIROC_ESM_CHEM, na.rm=TRUE)
    stats_out$AAD[i]<-AAD
    stats_out$mean[i]<-M
    stats_out$obs[i]<-nrow(data_out)
  }


print(stats_out)
write.csv(stats_out, "E://ClimateDataOutputs//MaxTemp2050rcp85_MIROC_ESM_CHEM_AAD.csv")

你为什么不堆叠光栅?

stack(list.files("E:\ClimateDataOutputs","mi85tx50",full.names=T))
data<-extract(grid, poly)

可能会有帮助

其他选项,行:

data_out= rbind(data_out,data1)

效率很低。在循环中先准备对象并像 data_out[j,] <- data1

一样填充它总是更好

最后,它有点难,但找到一种方法使您的 "j" 循环成为一个函数,并在所有多边形上使用 parLapply 并行分析。

此外,对于此类问题添加 system.time 语句总是更好,这样我们就可以更多地了解瓶颈所在。