将字符串转换为 numpy 数组

Question

输入：

mystr = "100110"

期望的输出 numpy 数组：

mynumpy == np.array([1, 0, 0, 1, 1, 0])

我试过：

np.fromstring(mystr, dtype=int, sep='')

但问题是我无法将我的字符串拆分为它的每个数字，因此 numpy 将其视为一个数字。知道如何将我的字符串转换为 numpy 数组吗？

Answer 1

list 可能会帮助您做到这一点。

import numpy as np

mystr = "100110"
print np.array(list(mystr))
# ['1' '0' '0' '1' '1' '0']

如果您想获取数字而不是字符串：

print np.array(list(mystr), dtype=int)
# [1 0 0 1 1 0]

Answer 2

您可以将它们读作 ASCII 字符，然后减去 48（0 的 ASCII 值）。对于大型字符串，这应该是最快的方法。

>>> np.fromstring("100110", np.int8) - 48
array([1, 0, 0, 1, 1, 0], dtype=int8)

或者，您可以先将字符串转换为整数列表：

>>> np.array(map(int, "100110"))
array([1, 0, 0, 1, 1, 0])

编辑：我做了一些快速计时，第一种方法比首先将其转换为列表快 100 倍以上。

Answer 3

除了上述答案之外，numpy 现在会在您使用 fromstring
时发出弃用警告 DeprecationWarning: The binary mode of fromstring is deprecated, as it behaves surprisingly on unicode inputs. Use frombuffer instead.
更好的选择是使用 fromiter。它的执行速度是原来的两倍。这是我在 jupyter notebook 中得到的 -

import numpy as np
mystr = "100110"

np.fromiter(mystr, dtype=int)
>> array([1, 0, 0, 1, 1, 0])

# Time comparison
%timeit np.array(list(mystr), dtype=int)
>> 3.5 µs ± 627 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.fromstring(mystr, np.int8) - 48
>> 3.52 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.fromiter(mystr, dtype=int)
1.75 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

将字符串转换为 numpy 数组

Convert string to numpy array

python

arrays

string

numpy