将数组转换为浮点数，如何反转这个过程？

Question

假设我们从一个 0 到 99 之间的整数 numpy 数组开始，即

x = np.array([[1,2,3,1],[10,5,0,2]],dtype=int)

现在我们要用单个唯一值表示此数组中的行。一种简单的方法是将其表示为浮点数。一个直观的方法是

rescale = np.power(10,np.arange(0,2*x.shape[1],2)[::-1],dtype=float)
codes = np.dot(x,rescale)

我们利用整数最多有 2 位数字。（我将 rescale 转换为浮点数以避免超过 int 的最大值，以防 x 的条目有更多元素；这不是很优雅）

这个returns

array([  1020301.,  10050002.])

如何将这个过程反过来，再次获得x？

我正在考虑将 codes 转换为字符串，然后每隔 2 个条目拆分该字符串。我不太熟悉这些字符串操作，尤其是当它们必须同时对数组的所有条目执行时。还有一个问题是第一个数字的位数不同，因此必须以某种方式添加尾随零。

也许使用一些除法或舍入，或者以不同的方式重新排列数组的行，可能会更简单。重要的是至少初始转换是快速和矢量化的。

欢迎提出建议。

Answer 1

由于您的数字介于 0 和 99 之间，因此您应该最多填充 2 位数字：0 变为“00”，5 变为“05”，50 变为“50”。这样，您需要做的就是反复将您的数字除以 100，您将得到值。您的编码也将更小，因为每个数字都以 2 位数字编码，而不是像您目前那样使用 2-3 位数字编码。

如果您希望也能检测到 [0,0,0]（目前与 [0] 或 [O.....O] 无法区分），请在您的前面添加一个 1数字：1000000 是 [0,0,0]，100 是 [0]。当你的除法returns1时，你就知道你已经完成了。

您可以使用该信息轻松构造一个字符串，然后将其转换为一个数字。

Answer 2

首先，您需要找到正确的列数：

number_of_cols = max(ceil(math.log(v, 100)) for v in codes)

请注意，您的第一列始终为 0，因此您的代码无法知道它是否存在：[[0, 1], [0, 2]] -> [1., 2.] -> [[1], [2]] or [[0, 0, 0, 1], [0, 0, 0, 2]]。这可能是需要考虑的事情。

无论如何，这是字符串方式的模型：

def decode_with_string(codes):
    number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
    str_format = '{:0%dd}'%(2*number_of_cols) # prepare to format numbers as string
    return [[int(str_format.format(int(code))[2*i:2*i+2]) # extract the wanted digits
             for i in range(number_of_cols)] # for all columns
            for code in codes] # for all rows

但您也可以直接计算数字：

def decode_direct(codes):
    number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
    return [[floor(code/(100**index)) % 100
             for index in range(number_of_cols-1, -1, -1)]
            for code in codes]

示例：

>>> codes = [  1020301.,  10050002.]
>>> number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
>>> print(number_of_cols)
4
>>> print(decode_with_strings(codes))
[[1, 2, 3, 1], [10, 5, 0, 2]]
>>> print(decode_direct(codes))
[[1, 2, 3, 1], [10, 5, 0, 2]]

这是一个 numpy 解决方案：

>>> divisors = np.power(0.01, np.arange(number_of_cols-1, -1, -1))
>>> x = np.mod(np.floor(divisors*codes.reshape((codes.shape[0], 1))), 100)

最后，你说你用float以防int溢出。首先，浮点数的尾数也是有限的，所以你不能消除溢出的风险。二、在Python3中，整数其实有unlimited precision.

Answer 3

您可以利用 Numpy 将其数组作为连续块存储在内存中。因此，将内存块存储为二进制字符串并记住数组的形状就足够了：

import numpy as np

x = np.array([[1,2,3,1],[10,5,0,2]], dtype=np.uint8) # 8 Bit are enough for 2 digits
x_sh = x.shape
# flatten array and convert to binarystring
xs = x.ravel().tostring()

# convert back and reshape:
y = np.reshape(np.fromstring(xs, np.uint8), x_sh)

之所以先将数组展平，是因为不需要关注二维数组的存储顺序（C或FORTRAN顺序）。当然你也可以为每一行分别生成一个字符串：

import numpy as np

x = np.array([[1,2,3,1],[10,5,0,2]], dtype=np.uint8) # 8 Bit are enough for 2 digits

# conversion:
xss = [xr.tostring() for xr in x]

# conversion back:
y = np.array([np.fromstring(xs, np.uint8) for xs in xss])

将数组转换为浮点数，如何反转这个过程？

Converting an array to a float, how to reverse the process?

python

arrays

string

split

numpy