为什么 `numpy.ndarray.view` 会忽略之前对 `numpy.ndarray.newbyteorder` 的调用？

Question

我有一个 NumPy 数组，其中一个元素的数据类型为 uint32:

>>> import numpy as np
>>> a = np.array([123456789], dtype=np.uint32)
>>> a.dtype.byteorder
'='

然后，我可以选择将数据解释为小端：

>>> a.newbyteorder("<").dtype.byteorder
'<'
>>> a.newbyteorder("<")
array([123456789], dtype=uint32)

或大端：

>>> a.newbyteorder(">").dtype.byteorder
'>'
>>> a.newbyteorder(">")
array([365779719], dtype=uint32)

其中后者 returns 不同的数字 365779719 因为我的平台是小端 - 因此已按小端顺序写入内存。

现在，令我意想不到的是以下对 view 的附加调用似乎不受此解释的影响：

>>> a.newbyteorder("<").view(np.uint8)
array([ 21, 205,  91,   7], dtype=uint8)
>>> a.newbyteorder(">").view(np.uint8)
array([ 21, 205,  91,   7], dtype=uint8)

我原以为这些数字与大端字节顺序相反。为什么这不会发生？ view查看数据"through"不是newbyteorder方法吗？

顺便说一句：如果我使用 byteswap 而不是 newbyteorder 并因此复制和更改内存中的字节，我显然得到了想要的结果：

>>> a.byteswap("<").view(np.uint8)
array([ 21, 205,  91,   7], dtype=uint8)
>>> a.byteswap(">").view(np.uint8)
array([  7,  91, 205,  21], dtype=uint8)

但是，我不想复制数据。

Answer 1

newbyteorder 应用的新字节顺序只是数组 dtype 的属性； a.newbyteorder("<") returns a 使用小端数据类型的视图。它不会改变内存的内容，也不会影响数组的形状或步幅。

ndarray.view 不关心原始数组的 dtype，little-endian 或 big。它关心数组的形状、步幅和实际内存内容，none 其中已更改。

Answer 2

只是添加到 , from documentation:

As you can imagine from the introduction, there are two ways you can affect the relationship between the byte ordering of the array and the underlying memory it is looking at:

Change the byte-ordering information in the array dtype so that it interprets the underlying data as being in a different byte order. This is the role of arr.newbyteorder()

Change the byte-ordering of the underlying data, leaving the dtype interpretation as it was. This is what arr.byteswap() does.

我在上面引述的重点。

从评论中收集的其他想法：

由于 newbyteorder() 与 view() 类似，它只是改变了底层数据的解释而不改变数据，所以看起来视图到视图是对相同（原始）数据的视图。所以，是的，你不能 "chain" 视图（好吧，你可以......但它始终是对相同原始数据的视图）。

How do I get the uint8 chunks in big-endian order without changing the memory, then?

尝试 np.sum(a.newbyteorder('<'))（或者，尝试 a.newbyteorder('<').tolist()）并更改 sign/endianness。因此，我对上述问题的回答是你不能这样做：要么将内存更改为 "in-place" 为 byteswap()，要么在访问元素时将数据复制到新的内存位置在视图中。

Answer 3

In [280]: a = np.array([123456789, 234567891, 345678912], dtype=np.uint32)

In [282]: a.tobytes()
Out[282]: b'\x15\xcd[\x07\xd38\xfb\r@\xa4\x9a\x14'

In [284]: a.view('uint8')
Out[284]: 
array([ 21, 205,  91,   7, 211,  56, 251,  13,  64, 164, 154,  20],
      dtype=uint8)

这与 a.view('<u1') 和 a.view('>u1') 相同，因为结束与单个字节无关。

In [291]: a.view('<u4')
Out[291]: array([123456789, 234567891, 345678912], dtype=uint32)
In [292]: a.view('>u4')
Out[292]: array([ 365779719, 3543726861, 1084529172], dtype=uint32)

视图完全取决于数据，而不是当前（最后）视图：

In [293]: a.view('<u4').view('u1')
Out[293]: 
array([ 21, 205,  91,   7, 211,  56, 251,  13,  64, 164, 154,  20],
      dtype=uint8)
In [294]: a.view('>u4').view('u1')
Out[294]: 
array([ 21, 205,  91,   7, 211,  56, 251,  13,  64, 164, 154,  20],
      dtype=uint8)

关于重塑和反转的想法：

In [295]: a.view('u1').reshape(-1,4)
Out[295]: 
array([[ 21, 205,  91,   7],
       [211,  56, 251,  13],
       [ 64, 164, 154,  20]], dtype=uint8)
In [296]: a.view('u1').reshape(-1,4)[:,::-1]
Out[296]: 
array([[  7,  91, 205,  21],
       [ 13, 251,  56, 211],
       [ 20, 154, 164,  64]], dtype=uint8)

但我无法更改此数组的视图（至 u4），因为它不连续：

In [297]: a.view('u1').reshape(-1,4)[:,::-1].view('<u4')
....
ValueError: To change to a dtype of a different size, the array must be C-contiguous

多看看这个反转数组的属性：

In [298]: a1 = a.view('u1').reshape(-1,4)[:,::-1]
In [299]: a1.flags
Out[299]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  ....
In [300]: a1.strides             # reversing is done with strides
Out[300]: (4, -1)

2 个数组共享同一个数据缓冲区。 a2 只是从不同的字节开始：

In [301]: a.__array_interface__['data']
Out[301]: (32659520, False)
In [302]: a1.__array_interface__['data']
Out[302]: (32659523, False)

我无法就地更改 a1:

In [304]: a1.shape = (12,)
...
AttributeError: incompatible shape for a non-contiguous array

如果我执行 reshape，我会得到一个副本（如完全不同的数据缓冲区地址所示）：

In [305]: a2 = a1.reshape(-1)
In [306]: a2
Out[306]: 
array([  7,  91, 205,  21,  13, 251,  56, 211,  20, 154, 164,  64],
      dtype=uint8)
In [307]: a2.view('<u4')
Out[307]: array([ 365779719, 3543726861, 1084529172], dtype=uint32)
In [308]: a2.__array_interface__['data']
Out[308]: (37940512, False)

因此您可以查看具有不同结束性的相同数据缓冲区，但是如果不制作非连续数组或制作副本，则无法以不同顺序查看各个字节。

newbyteorder 文档说它等同于：

arr.view(arr.dtype.newbytorder(new_order))

因此 a.view('<u4').newbyteorder('>') 与 a.view('<u4') 相同。 None 这些更改 a。

为什么 `numpy.ndarray.view` 会忽略之前对 `numpy.ndarray.newbyteorder` 的调用？

Why does `numpy.ndarray.view` ignore a previous call to `numpy.ndarray.newbyteorder`?

python

binary

numpy

endianness