如何使用话语 id 获得流畅的句子文本到语音转换

Question

我的目标是使用 Android Text-to-speech API 播放一段文本作为语音，同时跟踪当前所说的单词。

为了获得流畅自然的播放，我使用：

tts.speak("This is the sentence", TextToSpeech.QUEUE_FLUSH, null, null)

但后来我记不住当前所说的单词。

要在跟踪我当前说的单词的同时播放句子：

val words = "This is the sentence".split(" ")
words.forEachIndexed { index, element ->
    tts.speak(element, TextToSpeech.QUEUE_ADD, null, index.toString())
}

与 UtteranceProgressListener 结合使用，但语音非常生涩，无法作为自然句子回读。

有没有办法既能得到自然说出的句子，又能同时跟踪当前说出的单词？

Answer 1

如果您查看最新的 Android 文档，您会注意到 API 级别 26 中的 UtteranceProgressListener 引入了一个新方法 - 称为 onRangeStart((String utteranceId, int start, int end, int帧）

https://developer.android.com/reference/android/speech/tts/UtteranceProgressListener.html#onRangeStart(java.lang.String,%20int,%20int,%20int)

但是，正如文档所述：

"Only called if the engine supplies timing information by calling rangeStart(int, int, int)"

这是在 SynthesisCallback 中实现的： https://developer.android.com/reference/android/speech/tts/SynthesisCallback.html#rangeStart(int,%20int,%20int)

文档再次指出：

"The service may call this method to provide timing information about the spoken text."

所以，遗憾的是，这意味着提供您需要的时间信息的回调是依赖于实现的。

在我的设备上运行 Android 8.0.0 并使用默认的 tts 引擎 (com.google.android.tts) 我没有收到回调。

为了测试您需要

针对 SDK 级别 26 或更高级别构建
实现您自己的 UtteranceProgressListener
通过调用 TextToSpeech.setOnUtteranceProgressListener(listener);
重写你的 onRangeStart(String, int, int, int) 方法 UtteranceProgressListener.

如果您的实现支持您将回调的计时信息，如果不支持，您最好的选择是寻找另一个引擎实现或者实现您自己的 TextToSpeechService。

如何使用话语 id 获得流畅的句子文本到语音转换

How to get smooth text-to-speech of sentence with utterance ids

android

text-to-speech

kotlin