稀疏 DataArray Xarray 搜索

Question

在 xarray 中使用 DataArray 对象查找所有值为 != 0 的单元格的最佳方法是什么。

例如 pandas 我会

df.loc[df.col1 > 0]

我正在尝试查看 3 维脑成像数据的具体示例。

first_image_xarray.shape
(140, 140, 96)
dims = ['x','y','z']

查看 xarray.DataArray.where 的文档，我似乎想要这样的东西：

first_image_xarray.where(first_image_xarray.y + first_image_xarray.x  > 0,drop = True)[:,0,0]

但我仍然得到带零的数组。

<xarray.DataArray (x: 140)>
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -0.,  0., -0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
Dimensions without coordinates: x

此外 - 附带问题 - 为什么有一些负零？这些值是否四舍五入并为 -0。实际上等于 -0.009876 之类的东西？

Answer 1

（主要问题的答案）

你快到了。但是，语法上的细微差别在这里会产生很大的不同。一方面，这是使用 "value-based" 掩码过滤 >0 值的解决方案。

# if you want to DROP values which do not suffice a mask condition
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, drop=True)

或

# if you want to KEEP values which do not suffice a mask condition as nan
first_image_xarray[:,0,0].where(first_image_xarray[:,0,0] > 0, np.nan)

另一方面，您的尝试之所以没有如您所愿，是因为 first_image_xarray.x 指的是 index 中元素的 index数组（在 x 方向）而不是引用元素的 value。因此只有输出的第一个元素应该是 nan 而不是 0 因为它只是不满足切片 [:,0,0] 中的掩码条件。是的，您正在创建一个 "index-based" 掩码。

以下小实验（希望如此）阐明了这一关键差异。

假设我们有DataArray，它只包含0和1（维度与问题[=25=的原始post（OP）对齐]).首先让我们像 OP 那样基于 index 屏蔽它：

import numpy as np
import xarray as xr

np.random.seed(0)
# create a DataArray which randomly contains 0 or 1 values
a = xr.DataArray(np.random.randint(0, 2, 140*140*96).reshape((140, 140, 96)), dims=('x', 'y', 'z'))


# with this "index-based" mask, only elements where index of both x and y are 0 are replaced by nan
a.where(a.x + a.y > 0, drop=True)[:,0,0]

Out:
<xarray.DataArray (x: 140)>
array([ nan,   0.,   1.,   1.,   0.,   0.,   0.,   1.,   0.,   0.,   0.,   0.,
         0.,   1.,   0.,   1.,   0.,   1.,   0.,   0.,   0.,   1.,   0.,   0.,
         1.,   1.,   0.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,
         1.,   1.,   0.,   1.,   1.,   1.,   1.,   1.,   1.,   1.,   0.,   1.,
         1.,   0.,   0.,   0.,   1.,   1.,   1.,   0.,   0.,   1.,   0.,   0.,
         1.,   0.,   1.,   1.,   0.,   0.,   1.,   0.,   0.,   1.,   1.,   1.,
         0.,   0.,   0.,   1.,   1.,   0.,   1.,   0.,   1.,   1.,   0.,   0.,
         0.,   0.,   1.,   1.,   0.,   1.,   1.,   1.,   1.,   0.,   1.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,   1.,   0.,   1.,   1.,   0.,   0.,
         0.,   0.,   1.,   0.,   1.,   0.,   0.,   0.,   0.,   1.,   0.,   1.,
         0.,   0.,   1.,   0.,   0.,   0.,   0.,   0.,   1.,   1.,   0.,   0.,
         0.,   1.,   0.,   0.,   1.,   0.,   0.,   1.])
Dimensions without coordinates: x

通过上面的掩码，只有x和y中index的元素都0变成了nan 其余的根本没有改变或删除。

相比之下，提议的解决方案根据 DataArray 个元素的值屏蔽了 DataArray。

# with this "value-based" mask, all the values which do not suffice the mask condition are dropped
a[:,0,0].where(a[:,0,0] > 0, drop=True)

Out:
<xarray.DataArray (x: 65)>
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
Dimensions without coordinates: x

根据DataArray个元素的值，这成功删除了所有不满足掩码条件的值。

（边题答案）

至于 -0 和 0 在 DataArray 中的起源，从负面或正面向 0 舍入的值是可能的：相关讨论是done here How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy? 下面是这种情况的一个小例子。

import numpy as np
import xarray as xr

xr_array = xr.DataArray([-0.1, 0.1])

# you can use either xr.DataArray.round() or np.round() for rounding values of DataArray

xr.DataArray.round(xr_array)

Out:
<xarray.DataArray (dim_0: 2)>
array([-0.,  0.])
Dimensions without coordinates: dim_0

np.round(xr_array)

Out:
<xarray.DataArray (dim_0: 2)>
array([-0.,  0.])
Dimensions without coordinates: dim_0

作为旁注，在 NumPy 数组中获取 -0 的另一种可能性可以是 numpy.set_printoptions(precision=0)，它隐藏在小数点以下，如下所示（但我知道这次情况并非如此，因为您正在使用 DataArray):

import numpy as np

# default value is precision=8 in ver1.15
np.set_printoptions(precision=0)

np.array([-0.1, 0.1])

Out:
array([-0.,  0.])

无论如何，我最好的猜测是转换到 -0 应该是手动和有意的，而不是在数据准备和预处理阶段自动进行。

希望对您有所帮助。

稀疏 DataArray Xarray 搜索

Sparse DataArray Xarray search

python

pandas

python-xarray