当文件和数据集都具有 "time" 组件时，使用 xarray 打开多文件数据集

Question

我不知道如何表达这个问题，但我希望这个例子可以解释它。

我每天有一系列 netcdf 文件的数据。每个文件都包含数据的时间维度，作为 30 天的预测。

如果我使用以下方法读取一年的数据：

data=xarray.open_mfdataset(files, concat_dim='None', autoclose='True')

然后我得到：

Dimensions:   (None: 365, lat: 110, lon: 100, time: 395)

我只对每个文件在时间 = 0 时的值感兴趣，即对于文件 = 0，我希望时间 = 0，对于文件 = 360，我希望时间 = 360，等等

基本上我想我想做的只是从每个文件中读取时间组件的第一个元素，但我似乎无法弄清楚如何使用 open_mfdataset.

即使在阅读全部内容后删除不需要的值也可以，但我似乎无法弄清楚，因为 open_mfdataset 连接数据集的方式。

Answer 1

使用预处理函数可以让您做您想做的事。预处理函数在串联之前应用，因此您可以使用它在 open_mfdataset 步骤期间重新格式化数据集。

def preprocess(ds):
    '''keep only the first timestep for each file'''
    return ds.isel(time=0)


data = xr.open_mfdataset(files, preprocess=preprocess, concat_dim='time', ...)

根据文件的格式设置，您可能需要进一步清理 preprocess 中的数据集。

当文件和数据集都具有 "time" 组件时，使用 xarray 打开多文件数据集

Using xarray to open a multi-file dataset when both the files and dataset have a "time" component

python

python-xarray