python 中多维数组的最佳实践

best practice for multidimensional arrays in python

假设我有一个对象集合,我希望将其保存在 python 中,比如说,一个数字列表:[0.12, 0.85, 0.11, 0.12], [0.23, 0.52, 0.10, 0.19]等。进一步假设这些对象由 3 个属性索引,例如 "origin"、"destination" 和 "month"。我希望将这些对象存储在一个类似数组的对象中,该对象可以很容易地切片,最好使用数字索引或名称。

所以,即

obj[2,1,7] # might return: [0.23, 0.52, 0.10, 0.19]

或者,

obj['chicago','new york','jan'] # might return: [0.12, 0.85, 0.11, 0.12]

此外,

obj[:,'new york','jan'] # would return data with first index = any.

我正在 python 中寻找实现此目标的最佳实践。我确实找到了 post, which seems quite suitable, but it seemed to require some overhead and there was little discussion of alternatives. I also found something called the xarray 包,尽管这似乎并不受欢迎。我正在转换形式 R,我将在其中执行此操作 array() 函数,它将多维索引添加到任何类似向量的结构。

经过一番摸索,xarray 似乎适合我的需要。不幸的是,鉴于我缺乏经验,我无法谈论与其他软件包的兼容性或性能。

import numpy as np
import xarray as xr
cityOrig = ['chicago','new york', 'boston']
cityDest = ['chicago','new york', 'boston']
month = ['jan','feb','mar','apr']
data = np.random.rand(4,3,3,4)

myArray = xr.DataArray(data,
                       dims=['dat','orig','dest','month'],
                       coords = {'orig':cityOrig,'dest':cityDest,'month':month})

print(myArray[:,1,2,1].data)
[0.64  0.605 0.445 0.059]
print(myArray.loc[:,'chicago','new york','jan'].data)
[0.64  0.605 0.445 0.059]