Python : 如何使用speech_recognition或其他模块将base64音频字符串转换为文本？

Question

我有像 data:audio/mpeg;base64,//OAxAAAAANIAAAAABhqZ3f4StN3gOAaB4NAUBYZLv...... 这样的 base64 音频字符串，我试图使用 Python 中的 base64 模块将 base64 转换为 wav 文件：

    decode_bytes = base64.b64decode(encoding_str)
    with open(file_name + '.wav', "wb") as wav_file:
        wav_file.write(decode_bytes)

然后我尝试使用 speech_recognition 模块将音频转换为文本，但出现以下错误：

ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format

这个问题有解决办法吗？

Answer 1

您的音频文件似乎是 mime 类型的 mp3 - audio/mpeg。您需要将其保存为 mp3

decode_bytes = base64.b64decode(encoding_str)
    with open(file_name + '.mp3', "wb") as wav_file:
        wav_file.write(decode_bytes)

然后使用 pydub 或 FFmpeg 将 mp3 转换为 wav 格式，然后将此 wav 文件提供给 speech_recognition 模块。

Python : How to use speech_recognition or other modules to convert base64 audio string to text?