我有相同数量的文件,但它们的形状仍然不同,ANN 机器学习
I have the same number of files, and still having different shape on them, ANN machine learning
我正在尝试用 python 创建神经网络,它是一种用于分类问题的 ANN 网络。神经网络的目的是分类谁在说话,是我还是别人。我有 2 个文件夹中的数据。
folders image
一个叫me,是我说话的音频,另一个叫other,是别人说话的音频。
View of the wav files(audio data)
问题是无法训练网络,因为数据长度不一样,如果是!,每个文件夹里有18个,没有一个了,一个也不少。
当我做的时候
print(X.shape)
print(y.shape)
给出这个。
Result of X, y shapes
即使每个文件夹中有 18 个音频文件,形状也不一样
model.py
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
import numpy as np
from scipy.io import wavfile
from pathlib import Path
import os
### DATASET
pathlist = Path(os.path.abspath('Voiceclassification/Data/me/')).rglob('*.wav')
# My voice data
for path in pathlist:
filename = str(path)
# convert audio to numpy array and then 2D to 1D np Array
samplerate, data = wavfile.read(filename)
#print(f"sample rate: {samplerate}")
data = data.flatten()
#print(f"data: {data}")
pathlist2 = Path(os.path.abspath('Voiceclassification/Data/other/')).rglob('*.wav')
# other voice data
for path2 in pathlist2:
filename2 = str(path2)
samplerate2, data2 = wavfile.read(filename2)
data2 = data2.flatten()
#print(data2)
### ADAPTING THE DATA FOR THE MODEL
X = data # My voice
y = data2 # Other data
#print(X.shape)
#print(y.shape)
### Trainig the model
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
# Performing future scaling
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
### Creating the ANN
ann = tf.keras.models.Sequential()
# First hidden layer of the ann
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
# Second one
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
# Output layer
ann.add(tf.keras.layers.Dense(units=6, activation="sigmoid"))
# Compile our neural network
ann.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=['accuracy'])
# Fit ANN
ann.fit(x_train, y_train, batch_size=32, epochs=100)
ann.save('train_model.model')
有什么想法吗?
是因为你的wav音频文件可能有不同的大小,它们可以都是10秒,但是如果毫秒不同,那会影响你的数据形状,你可以做的是trim你的wav文件所以它们都是 10.00 秒,没有毫秒
我正在尝试用 python 创建神经网络,它是一种用于分类问题的 ANN 网络。神经网络的目的是分类谁在说话,是我还是别人。我有 2 个文件夹中的数据。 folders image 一个叫me,是我说话的音频,另一个叫other,是别人说话的音频。 View of the wav files(audio data)
问题是无法训练网络,因为数据长度不一样,如果是!,每个文件夹里有18个,没有一个了,一个也不少。
当我做的时候
print(X.shape)
print(y.shape)
给出这个。 Result of X, y shapes 即使每个文件夹中有 18 个音频文件,形状也不一样
model.py
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
import numpy as np
from scipy.io import wavfile
from pathlib import Path
import os
### DATASET
pathlist = Path(os.path.abspath('Voiceclassification/Data/me/')).rglob('*.wav')
# My voice data
for path in pathlist:
filename = str(path)
# convert audio to numpy array and then 2D to 1D np Array
samplerate, data = wavfile.read(filename)
#print(f"sample rate: {samplerate}")
data = data.flatten()
#print(f"data: {data}")
pathlist2 = Path(os.path.abspath('Voiceclassification/Data/other/')).rglob('*.wav')
# other voice data
for path2 in pathlist2:
filename2 = str(path2)
samplerate2, data2 = wavfile.read(filename2)
data2 = data2.flatten()
#print(data2)
### ADAPTING THE DATA FOR THE MODEL
X = data # My voice
y = data2 # Other data
#print(X.shape)
#print(y.shape)
### Trainig the model
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
# Performing future scaling
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
### Creating the ANN
ann = tf.keras.models.Sequential()
# First hidden layer of the ann
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
# Second one
ann.add(tf.keras.layers.Dense(units=6, activation="relu"))
# Output layer
ann.add(tf.keras.layers.Dense(units=6, activation="sigmoid"))
# Compile our neural network
ann.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=['accuracy'])
# Fit ANN
ann.fit(x_train, y_train, batch_size=32, epochs=100)
ann.save('train_model.model')
有什么想法吗?
是因为你的wav音频文件可能有不同的大小,它们可以都是10秒,但是如果毫秒不同,那会影响你的数据形状,你可以做的是trim你的wav文件所以它们都是 10.00 秒,没有毫秒