定义每个元素的大小为python中多维数组形状中的第三个dimension/parameter

Question

我的数据集看起来像（前 5 个值或我的数据集的头部）：

  a     b     c     d
00010 01001 01001 01000
01001 00101 01001 01001
00011 00011 10000 01001
00101 01000 01001 01000
01001 00101 01001 01001

让不。样本数=190 所以当我检查数据的形状时，结果是 (193, 4)

我希望我的数据集显示形状：(190, 4, 5) 其中形状中的第三个元素 = 特定列中的值数 = 00011 = 5

我怎样才能做到这一点？

data1 = pd.read_csv(r"F:\my work/binary_bm.csv", dtype=str)
df2 = np.array(data1)
df2
array([['00010', '01001', '01001', ..., '10000', '00110', '00101'],
   ['01001', '00101', '01001', ..., '10000', '00100', '00110'],
   ['00011', '00011', '10000', ..., '01001', '00011', '00011'],
   ...,
   ['01000', '01001', '01000', ..., '10000', '00100', '00110'],
   ['00010', '01001', '01001', ..., '10000', '00101', '00011'],
   ['00110', '00110', '01001', ..., '10000', '00101', '00101']],
  dtype=object)
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1335 entries, 0 to 1334
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   a       1335 non-null   object
 1   b       1335 non-null   object
 2   c       1335 non-null   object
 3   d       1335 non-null   object
 4   e       1335 non-null   object
 5   f       1335 non-null   object
 6   g       1335 non-null   object
 7   h       1335 non-null   object
dtypes: object(8)
memory usage: 83.6+ KB

x_train = df1.iloc[:930,:5]
x_test = df1.iloc[930:, :5]
y_train = df1.iloc[:930, 5:]
y_test = df1.iloc[930:, 5:]
x_train.head()
       a      b       c       d       e
1016 00010  01001   01001   01000   00110
445  01001  00101   01001   01001   00110
458  00011  00011   10000   01001   10000
251  00101  01000   01001   01000   00110
980  01001  00101   01001   01001   00110

x_tr = np.array(x_train)
print(x_tr)
[['00010' '01001' '01001' '01000' '00110']
 ['01001' '00101' '01001' '01001' '00110']
 ['00011' '00011' '10000' '01001' '10000']
 ...
 ['00100' '01000' '01001' '01000' '00111']
 ['00101' '01001' '01001' '01000' '00110']
 ['00100' '01000' '01001' '01000' '00111']]

x_tr.shape
(930, 5)

Answer 1

您可以遍历数组并将每个字符串转换为 整数列表，然后重建数组并重塑为最终形式。

使用以下示例：

>>> x = np.array([['00010', '01001', '01001', '01000', '00110'],
                  ['01001', '00101', '01001', '01001', '00110'],
                  ['00011', '00011', '10000', '01001', '10000'],
                  ['00100', '01000', '01001', '01000', '00111'],
                  ['00101', '01001', '01001', '01000', '00110'],
                  ['00100', '01000', '01001', '01000', '00111']])

这看起来像：

>>> np.array([int(k) for s in x.flatten() for k in s]).reshape(-1, 5, 5)
array([[[0, 0, 0, 1, 0],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 1, 0]],

       [[0, 1, 0, 0, 1],
        [0, 0, 1, 0, 1],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 1],
        [0, 0, 1, 1, 0]],

       [[0, 0, 0, 1, 1],
        [0, 0, 0, 1, 1],
        [1, 0, 0, 0, 0],
        [0, 1, 0, 0, 1],
        [1, 0, 0, 0, 0]],

       [[0, 0, 1, 0, 0],
        [0, 1, 0, 0, 0],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 1, 1]],

       [[0, 0, 1, 0, 1],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 1, 0]],

       [[0, 0, 1, 0, 0],
        [0, 1, 0, 0, 0],
        [0, 1, 0, 0, 1],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 1, 1]]])

定义每个元素的大小为python中多维数组形状中的第三个dimension/parameter

Define the size of each element as the third dimension/parameter in the shape of the multi-dimensional array in python

python

numpy

multidimensional-array

reshape

numpy-ndarray