在所有月份应用函数 xarray python
apply function over all months xarray python
在感兴趣的月份有效地应用函数 xarray
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range('2010-01-01','2018-12-31',freq='M')
lat = np.linspace(-5.175003, -4.7250023, 10)
lon = np.linspace(33.524994, 33.97499, 10)
precip = np.random.normal(0, 1, size=(len(time), len(lat), len(lon)))
ds = xr.Dataset(
{'precip': (['time', 'lat', 'lon'], precip)},
coords={
'lon': lon,
'lat': lat,
'time': time,
}
)
Out[]:
<xarray.Dataset>
Dimensions: (lat: 10, lon: 10, time: 108)
Coordinates:
* lon (lon) float64 33.52 33.57 33.62 33.67 ... 33.82 33.87 33.92 33.97
* lat (lat) float64 -5.175 -5.125 -5.075 -5.025 ... -4.825 -4.775 -4.725
* time (time) datetime64[ns] 2010-01-31 2010-02-28 ... 2018-12-31
Data variables:
precip (time, lat, lon) float64 -0.7862 -0.28 1.236 ... 0.6622 -0.7682
我目前的做法
我目前通过以下方式应用功能:
- 遍历所有月份
- 选择原始数据集中那个月的所有时间步长
- 对那些月份应用一个函数(这里是标准化排名)
- 将每月
DataArray
的列表重新组合成具有所有时间步长的 Dataset
该函数可能与气候学有所不同,但这里是归一化排序。
- 获取变量值与数据集中 month
的所有其他值相比的排名
- 将其设置在 0-100
的范围内
variable = 'precip'
rank_norm_list = []
# loop through all the months
for mth in range(1, 13):
# select that month
ds_mth = (
ds
.where(ds['time.month'] == mth)
.dropna(dim='time', how='all')
)
# apply the function to that month (here a normalised rank (0-100))
rank_norm_mth = (
(ds_mth.rank(dim='time') - 1) / (ds_mth.time.size - 1.0) * 100.0
)
rank_norm_mth = rank_norm_mth.rename({variable: 'rank_norm'})
rank_norm_list.append(rank_norm_mth)
# after the loop re-combine the DataArrays
rank_norm = xr.merge(rank_norm_list).sortby('time')
Out[]:
<xarray.Dataset>
Dimensions: (lat: 10, lon: 10, time: 108)
Coordinates:
* time (time) datetime64[ns] 2010-01-31 2010-02-28 ... 2018-12-31
* lat (lat) float64 -5.175 -5.125 -5.075 ... -4.825 -4.775 -4.725
* lon (lon) float64 33.52 33.57 33.62 33.67 ... 33.82 33.87 33.92 33.97
Data variables:
rank_norm (time, lat, lon) float64 75.0 75.0 12.5 100.0 ... 87.5 0.0 25.0
有没有不涉及循环和选择的clever/more有效方法?
谢谢你的好例子。确实有一种更简单的方法可以使用 groupby
and apply
:
def rank_norm(ds, dim):
return (ds.rank(dim=dim) - 1) / (ds.sizes[dim] - 1.0) * 100.0
result = ds.groupby('time.month').apply(rank_norm, args=('time',))
在感兴趣的月份有效地应用函数 xarray
import pandas as pd
import numpy as np
import xarray as xr
time = pd.date_range('2010-01-01','2018-12-31',freq='M')
lat = np.linspace(-5.175003, -4.7250023, 10)
lon = np.linspace(33.524994, 33.97499, 10)
precip = np.random.normal(0, 1, size=(len(time), len(lat), len(lon)))
ds = xr.Dataset(
{'precip': (['time', 'lat', 'lon'], precip)},
coords={
'lon': lon,
'lat': lat,
'time': time,
}
)
Out[]:
<xarray.Dataset>
Dimensions: (lat: 10, lon: 10, time: 108)
Coordinates:
* lon (lon) float64 33.52 33.57 33.62 33.67 ... 33.82 33.87 33.92 33.97
* lat (lat) float64 -5.175 -5.125 -5.075 -5.025 ... -4.825 -4.775 -4.725
* time (time) datetime64[ns] 2010-01-31 2010-02-28 ... 2018-12-31
Data variables:
precip (time, lat, lon) float64 -0.7862 -0.28 1.236 ... 0.6622 -0.7682
我目前的做法
我目前通过以下方式应用功能:
- 遍历所有月份
- 选择原始数据集中那个月的所有时间步长
- 对那些月份应用一个函数(这里是标准化排名)
- 将每月
DataArray
的列表重新组合成具有所有时间步长的Dataset
该函数可能与气候学有所不同,但这里是归一化排序。
- 获取变量值与数据集中 month
的所有其他值相比的排名
- 将其设置在 0-100
variable = 'precip'
rank_norm_list = []
# loop through all the months
for mth in range(1, 13):
# select that month
ds_mth = (
ds
.where(ds['time.month'] == mth)
.dropna(dim='time', how='all')
)
# apply the function to that month (here a normalised rank (0-100))
rank_norm_mth = (
(ds_mth.rank(dim='time') - 1) / (ds_mth.time.size - 1.0) * 100.0
)
rank_norm_mth = rank_norm_mth.rename({variable: 'rank_norm'})
rank_norm_list.append(rank_norm_mth)
# after the loop re-combine the DataArrays
rank_norm = xr.merge(rank_norm_list).sortby('time')
Out[]:
<xarray.Dataset>
Dimensions: (lat: 10, lon: 10, time: 108)
Coordinates:
* time (time) datetime64[ns] 2010-01-31 2010-02-28 ... 2018-12-31
* lat (lat) float64 -5.175 -5.125 -5.075 ... -4.825 -4.775 -4.725
* lon (lon) float64 33.52 33.57 33.62 33.67 ... 33.82 33.87 33.92 33.97
Data variables:
rank_norm (time, lat, lon) float64 75.0 75.0 12.5 100.0 ... 87.5 0.0 25.0
有没有不涉及循环和选择的clever/more有效方法?
谢谢你的好例子。确实有一种更简单的方法可以使用 groupby
and apply
:
def rank_norm(ds, dim):
return (ds.rank(dim=dim) - 1) / (ds.sizes[dim] - 1.0) * 100.0
result = ds.groupby('time.month').apply(rank_norm, args=('time',))