定义每个元素的大小为python中多维数组形状中的第三个dimension/parameter
Define the size of each element as the third dimension/parameter in the shape of the multi-dimensional array in python
我的数据集看起来像(前 5 个值或我的数据集的头部):
a b c d
00010 01001 01001 01000
01001 00101 01001 01001
00011 00011 10000 01001
00101 01000 01001 01000
01001 00101 01001 01001
让不。样本数=190
所以当我检查数据的形状时,结果是 (193, 4)
我希望我的数据集显示形状:(190, 4, 5)
其中形状中的第三个元素 = 特定列中的值数 = 00011 = 5
我怎样才能做到这一点?
data1 = pd.read_csv(r"F:\my work/binary_bm.csv", dtype=str)
df2 = np.array(data1)
df2
array([['00010', '01001', '01001', ..., '10000', '00110', '00101'],
['01001', '00101', '01001', ..., '10000', '00100', '00110'],
['00011', '00011', '10000', ..., '01001', '00011', '00011'],
...,
['01000', '01001', '01000', ..., '10000', '00100', '00110'],
['00010', '01001', '01001', ..., '10000', '00101', '00011'],
['00110', '00110', '01001', ..., '10000', '00101', '00101']],
dtype=object)
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1335 entries, 0 to 1334
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 1335 non-null object
1 b 1335 non-null object
2 c 1335 non-null object
3 d 1335 non-null object
4 e 1335 non-null object
5 f 1335 non-null object
6 g 1335 non-null object
7 h 1335 non-null object
dtypes: object(8)
memory usage: 83.6+ KB
x_train = df1.iloc[:930,:5]
x_test = df1.iloc[930:, :5]
y_train = df1.iloc[:930, 5:]
y_test = df1.iloc[930:, 5:]
x_train.head()
a b c d e
1016 00010 01001 01001 01000 00110
445 01001 00101 01001 01001 00110
458 00011 00011 10000 01001 10000
251 00101 01000 01001 01000 00110
980 01001 00101 01001 01001 00110
x_tr = np.array(x_train)
print(x_tr)
[['00010' '01001' '01001' '01000' '00110']
['01001' '00101' '01001' '01001' '00110']
['00011' '00011' '10000' '01001' '10000']
...
['00100' '01000' '01001' '01000' '00111']
['00101' '01001' '01001' '01000' '00110']
['00100' '01000' '01001' '01000' '00111']]
x_tr.shape
(930, 5)
您可以遍历数组并将每个字符串转换为 整数列表,然后重建数组并重塑为最终形式。
使用以下示例:
>>> x = np.array([['00010', '01001', '01001', '01000', '00110'],
['01001', '00101', '01001', '01001', '00110'],
['00011', '00011', '10000', '01001', '10000'],
['00100', '01000', '01001', '01000', '00111'],
['00101', '01001', '01001', '01000', '00110'],
['00100', '01000', '01001', '01000', '00111']])
这看起来像:
>>> np.array([int(k) for s in x.flatten() for k in s]).reshape(-1, 5, 5)
array([[[0, 0, 0, 1, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 0]],
[[0, 1, 0, 0, 1],
[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0]],
[[0, 0, 0, 1, 1],
[0, 0, 0, 1, 1],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 0, 0]],
[[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 1]],
[[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 0]],
[[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 1]]])
我的数据集看起来像(前 5 个值或我的数据集的头部):
a b c d
00010 01001 01001 01000
01001 00101 01001 01001
00011 00011 10000 01001
00101 01000 01001 01000
01001 00101 01001 01001
让不。样本数=190 所以当我检查数据的形状时,结果是 (193, 4)
我希望我的数据集显示形状:(190, 4, 5) 其中形状中的第三个元素 = 特定列中的值数 = 00011 = 5
我怎样才能做到这一点?
data1 = pd.read_csv(r"F:\my work/binary_bm.csv", dtype=str)
df2 = np.array(data1)
df2
array([['00010', '01001', '01001', ..., '10000', '00110', '00101'],
['01001', '00101', '01001', ..., '10000', '00100', '00110'],
['00011', '00011', '10000', ..., '01001', '00011', '00011'],
...,
['01000', '01001', '01000', ..., '10000', '00100', '00110'],
['00010', '01001', '01001', ..., '10000', '00101', '00011'],
['00110', '00110', '01001', ..., '10000', '00101', '00101']],
dtype=object)
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1335 entries, 0 to 1334
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 1335 non-null object
1 b 1335 non-null object
2 c 1335 non-null object
3 d 1335 non-null object
4 e 1335 non-null object
5 f 1335 non-null object
6 g 1335 non-null object
7 h 1335 non-null object
dtypes: object(8)
memory usage: 83.6+ KB
x_train = df1.iloc[:930,:5]
x_test = df1.iloc[930:, :5]
y_train = df1.iloc[:930, 5:]
y_test = df1.iloc[930:, 5:]
x_train.head()
a b c d e
1016 00010 01001 01001 01000 00110
445 01001 00101 01001 01001 00110
458 00011 00011 10000 01001 10000
251 00101 01000 01001 01000 00110
980 01001 00101 01001 01001 00110
x_tr = np.array(x_train)
print(x_tr)
[['00010' '01001' '01001' '01000' '00110']
['01001' '00101' '01001' '01001' '00110']
['00011' '00011' '10000' '01001' '10000']
...
['00100' '01000' '01001' '01000' '00111']
['00101' '01001' '01001' '01000' '00110']
['00100' '01000' '01001' '01000' '00111']]
x_tr.shape
(930, 5)
您可以遍历数组并将每个字符串转换为 整数列表,然后重建数组并重塑为最终形式。
使用以下示例:
>>> x = np.array([['00010', '01001', '01001', '01000', '00110'],
['01001', '00101', '01001', '01001', '00110'],
['00011', '00011', '10000', '01001', '10000'],
['00100', '01000', '01001', '01000', '00111'],
['00101', '01001', '01001', '01000', '00110'],
['00100', '01000', '01001', '01000', '00111']])
这看起来像:
>>> np.array([int(k) for s in x.flatten() for k in s]).reshape(-1, 5, 5)
array([[[0, 0, 0, 1, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 0]],
[[0, 1, 0, 0, 1],
[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 0, 1, 1, 0]],
[[0, 0, 0, 1, 1],
[0, 0, 0, 1, 1],
[1, 0, 0, 0, 0],
[0, 1, 0, 0, 1],
[1, 0, 0, 0, 0]],
[[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 1]],
[[0, 0, 1, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 0]],
[[0, 0, 1, 0, 0],
[0, 1, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 0, 0, 0],
[0, 0, 1, 1, 1]]])