将时间序列数据从 csv 转换为 netCDF python

Question

这个过程中的主要问题是下面的代码：

precip[:] = orig

产生错误：

ValueError: cannot reshape array of size 5732784 into shape (39811,144,144)

我有两个CSV文件，其中一个CSV文件包含一个变量（降水量）的所有实际数据，每一列为一个站点，它们对应的坐标在第二个单独的CSV文件中。我的样本数据在 google drive here.

如果您想查看数据本身，但我的第一个 CSV 文件的形状为 (39811, 144)，第二个 CSV 文件的形状为 (171, 10)，但请注意；我仅将切片数据帧用作 (144, 2)。

这是代码：

stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])

lons = stncoords['X']
lats = stncoords['Y']

ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')

ncout.createDimension('longitude',lons.shape[0])
ncout.createDimension('latitude',lats.shape[0])
ncout.createDimension('precip',orig.shape[1])
ncout.createDimension('time',orig.shape[0])

lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()

lats = ncout.createVariable('latitude',np.dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('longitude',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'longitude','latitude'))

lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()

我的代码主要基于此 post：convert-csv-to-netcdf 但不包括变量 'TIME' 作为第 3 个维度，所以这就是我失败的地方。我想我应该期望降水变量的形状为 (39811, 144, 144)，但错误提示并非如此。

不确定如何处理这个问题，欢迎任何意见。

Answer 1

由于您有来自不同站点的数据，我建议您对 netCDF 文件使用维度 station，而不是将 lon 和 lat 分开。当然你也可以把每个站点的经纬度保存到单独的变量中。

以下是一种可能的解决方案，以您的代码为例：

#!/usr/bin/env ipython
import pandas as pd
import numpy as np
import netCDF4

stn_precip='Precip_1910-2018_stations.csv'
orig_precip='Precip_1910-2018_origvals.csv'
stations = pd.read_csv(stn_precip)
stncoords = stations.iloc[:,[0,1]][:144]
orig = pd.read_csv(orig_precip, skiprows = 1, names = stations['Code'][:144])

lons = stncoords['X']
lats = stncoords['Y']
nstations = np.size(lons)

ncout = netCDF4.Dataset('Precip_1910-2018_homomod.nc', 'w')

ncout.createDimension('station',nstations)
ncout.createDimension('time',orig.shape[0])

lons_out = lons.tolist()
lats_out = lats.tolist()
time_out = orig.index.tolist()

lats = ncout.createVariable('latitude',np.dtype('float32').char,('station',))
lons = ncout.createVariable('longitude',np.dtype('float32').char,('station',))
time = ncout.createVariable('time',np.dtype('float32').char,('time',))
precip = ncout.createVariable('precip',np.dtype('float32').char,('time', 'station'))

lats[:] = lats_out
lons[:] = lons_out
time[:] = time_out
precip[:] = orig
ncout.close()

所以关于输出文件（ncdump -h Precip_1910-2018_homomod.nc）的信息是这样的：

将时间序列数据从 csv 转换为 netCDF python

Convert time series data from csv to netCDF python

python

csv

netcdf

pandas

netcdf4