SpeechRecogntion 质量极差，尤其是与 Word 相比

Question

我正在使用 WPF 语音识别库，试图在桌面应用程序中使用它来替代菜单命令。（我想专注于没有键盘的平板电脑体验）。它的工作原理 - 有点，除了识别的准确性太差以至于无法使用。所以我试着用 Word 口述。 Word 运行良好。我在这两种情况下都使用我的内置笔记本电脑麦克风，并且这两个程序都能够同时听到相同的语音（前提是 Word 保留键盘焦点），但 Word 做对了，而 WPF 做得很糟糕。

我已经尝试了通用的 DictationGrammar() 和小型专用语法，并且我已经尝试了“en-US”和“en-AU”，在所有情况下，Word 表现良好而 WPF 表现不佳.即使将 WPF 中的专用语法与 Word 中的通用语法进行比较，WPF 也有 50% 的时间出错，例如将“尺寸小”听成“颜色小”。

    private void InitSpeechRecognition()
    {
        recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));

        // Create and load a grammar.  
        if (false)
        {
            GrammarBuilder grammarBuilder = new GrammarBuilder();
            Choices commandChoices = new Choices("weight", "color", "size");
            grammarBuilder.Append(commandChoices);
            Choices valueChoices = new Choices();
            valueChoices.Add("normal", "bold");
            valueChoices.Add("red", "green", "blue");
            valueChoices.Add("small", "medium", "large");
            grammarBuilder.Append(valueChoices);
            recognizer.LoadGrammar(new Grammar(grammarBuilder));
        }
        else
        {
            recognizer.LoadGrammar(new DictationGrammar());
        }

        // Add a handler for the speech recognized event.  
        recognizer.SpeechRecognized +=
                            new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

        // Configure input to the speech recognizer.  
        recognizer.SetInputToDefaultAudioDevice();

        // Start asynchronous, continuous speech recognition.  
        recognizer.RecognizeAsync(RecognizeMode.Multiple);
    }

来自 Word 的示例结果：

Hello 
make it darker 
I want a brighter colour 
make it reader 
make it greener 
thank you 
make it bluer 
make it more blue
make it darker 
turn on debugging 
turn off debugging 
zoom in 
zoom out

WPF中的相同音频，听写语法：

a lower
make it back
when Ted Brach
making reader
and he
liked the
ethanol and
act out
to be putting
it off the parking
zoom in
and out

我使用 Nuget 获得了程序集。我使用的是运行时版本=v4.0.30319 和版本=4.0.0.0。如果我应该“训练”它，文档没有解释如何做到这一点，而且我不知道训练是否与其他程序（如 Word）共享，或者训练保存在哪里。我已经和它玩了很久了，现在它已经知道我的声音了。

谁能告诉我我做错了什么？

Answer 1

由于您实际上是在创建语音用户界面，而不仅仅是语音识别，因此您应该查看 Speechly。使用 Speechly，创建不需要硬编码命令但支持多种表达同一事物的方式的自然体验要容易得多。将它集成到您的应用程序中也应该非常简单。首页上有一个小代码笔，可以让你有个基本的了解。

Answer 2

你最好的选择是不使用 DictationGrammar，而是使用包含整个短语或键值分配的特定语法：

private static SpeechRecognitionEngine CreateRecognitionEngine()
{
    var cultureInf = new System.Globalization.CultureInfo("en-US");

    var recoEngine = new SpeechRecognitionEngine(cultureInf);
    recoEngine.SetInputToDefaultAudioDevice();
            
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "weight", new string[] { "normal", "bold", "demibold" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "color", new string[] { "red", "green", "blue" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "size", new string[]{ "small", "medium", "large" }));

    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "", new string[] { "Put whole phrase here", "Put whole phrase here again", "another long phrase" }));

    return recoEngine;
}

static Grammar CreateKeyValuesGrammar(CultureInfo cultureInf, string key, string[] values)
{
    var grBldr = string.IsNullOrWhiteSpace(key) ? new GrammarBuilder() { Culture = cultureInf } : new GrammarBuilder(key) { Culture = cultureInf };
    grBldr.Append(new Choices(values));

    return new Grammar(grBldr);
}

您也可以尝试使用 Microsoft.Speech.Recognition 参见 What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

Answer 3

这是意料之中的。 Word 的听写使用基于云的 AI/ML 辅助语音服务：Azure Cognitive Services - Speech To Text。它正在不断训练和更新以获得最佳准确性。您可以通过脱机并尝试 Word 中的听写功能轻松地对此进行测试 - 它不会起作用。

.NET 的 System.Speech 使用离线 SAPI5，据我所知，自 Windows 7 以来就没有更新过。核心技术本身（Windows 95 时代）比今天的手机或基于云的服务可用的技术要老得多。 Microsoft.Speech.Recognition 也使用类似的核心，不会更好 - 尽管您可以尝试一下。

如果您想探索其他离线选项，我建议您尝试 Windows.Media.SpeechRecognition。据我所知，它与 Windows 8 及更高版本上的 Cortana 和其他现代语音识别应用程序使用的技术相同，并且不使用 SAPI5。

很容易在网上找到 Azure 或 Windows.Media.SpeechRecognition 的示例，使用后者的最佳方法是将您的应用程序更新到 .NET 5 并使用 C#/WinRT 访问 UWP API .

Answer 4

如果每个人都需要使用准确率达到 Cortana 90% 的语音识别引擎，则应遵循以下步骤。

步骤 1) 下载 Nugget 包 Microsoft.Windows.SDK.Contracts

步骤 2) 迁移到引用 SDK 的包 --> https://devblogs.microsoft.com/nuget/migrate-packages-config-to-package-reference/

上述SDK将为您提供windows10种Win32应用程序语音识别系统。必须这样做，因为使用此语音识别引擎的唯一方法是构建通用 Windows 平台应用程序。我不建议制作A.I。通用 Windows 平台中的应用程序，因为它具有沙盒功能。沙盒功能将应用程序隔离在一个容器中，它不允许它与任何硬件通信，它还会使文件访问变得非常痛苦，并且线程管理是不可能的，只有异步功能。

步骤 3) 在命名空间部分添加此命名空间。这个命名空间包含所有与在线语音识别相关的功能。

using Windows.Media.SpeechRecognition;

步骤 4) 添加语音识别实现。

Task.Run(async()=>
{
  try
  {
    
    var speech = new SpeechRecognizer();
    await speech.CompileConstraintsAsync();
    SpeechRecognitionResult result = await speech.RecognizeAsync();
    TextBox1.Text = result.Text;
  }
  catch{}
});

Windows 10 SpeechRecognizer class 中的大多数方法需要异步调用，这意味着您必须运行它们在 Task.Run(async ()=>{}) 带有异步参数、异步方法或异步任务方法的 lambda 函数。

为了使其正常工作，请转到 OS 中的设置 -> 隐私 -> 语音并检查是否允许在线语音识别。

SpeechRecogntion 质量极差，尤其是与 Word 相比

SpeechRecogntion quality is extremely poor especially compared to Word

c#

wpf

speech-recognition