如何使用来自大量 wav 文件的 tensorflow.data.Dataset api 创建数据集？

Question

我有 8,742 个 wav 文件（总计约 7.1GB），想将原始数据放入 tf.data.Dataset。

我的第一次尝试如下。请注意，我使用了 soundfile 包，因为 wav 文件具有不同的比特率，有些是每个样本 24 位。据我了解，许多软件包不支持 24 位 wav 文件。

import tensorflow as tf
import soundfile

filepaths = tf.gfile.Glob('michael/dataset/wav_filepaths/*.wav') #Get the files into a list

labels = get_labels #pseudo function to obtain corresponding labels to audio

raw_audio = [] #List to hold raw audio lists. These are 2 channel wavs so this will be a 3D list

#Create a list were each element is raw audio data
for f in filepaths:
    try:
        data, sample_rate = soundfile.read(f) #2 channels
        raw_audio.append(data.tolist())
    except Exception as err: #Poor practice to catch all exceptions like this but it is just an example
        print ('Exception')
        print (f)

training_set = tf.data.Dataset.from_tensor_slices((raw_audio, labels))

此解决方案的问题在于它的速度非常慢，因为声音文件会读取所有原始数据并将其全部存储在列表中。

我现在正在考虑一个解决方案，我最初将文件名和相应的标签存储在 tf.data.Dataset 中。然后我会创建一个映射函数，它调用 soundfile.read 或者甚至可能在函数内使用 tensorflow.contrib.framework.python.ops.audio_ops 并且只 return 原始音频和相应的标签。该函数将使用 tf.data.Dataset.map 函数调用，以便整个过程成为图形的一部分并并行化。

我对提议的解决方案的第一个担忧是，它并不理想，而且似乎有点 "hacky" 将文件名存储在数据集中，以便稍后用相应的数据替换。我的第二个问题是我使用的 GPU（1080Ti 和 11GB 内存）可能运行内存不足。

请提供一种更好的方法（特别是它应该更快）从大量 wav 文件中获取原始音频数据到 tf.data.Dataset。

Answer 1

尽管理论上您可以使用 tf.read_file and decode them with tf.contrib.ffmpeg.decode_audio, the usual approach for this kind of cases is to convert the data to TFRecord format and read it with a tf.data.TFRecordDataset. This blog post shows an example of how to do that, in your case you would need a script that reads each WAV file, decodes it and writes the vector of samples (I suppose as a 32-bit value would be the simplest way) in the file. Note that if you want to batch multiple audio files into a tensor either they must have all the same size or you would have to use tf.data.Dataset.padded_batch 读取文件以形成适当的张量。

Answer 2

您可以尝试使用将数据馈送到管道的生成器函数。看看https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator

如何使用来自大量 wav 文件的 tensorflow.data.Dataset api 创建数据集？

How should one create a dataset using the tensorflow.data.Dataset api from a large set of wav files?

python

audio

wav

tensorflow

tensorflow-datasets