使用 Microsoft Cognitive Speech + Websocket 的连续语音识别 - Xamarin

Continuous Speech Recognition using Microsoft Cognitive Speech + Websocket - Xamarin

我正在尝试使用 Microsoft Cognitive Speech[=39= 从 麦克风 构建一个 连续语音识别 ] 对于 Xamarin Android。我不认为 Xamarin 有库,所以我稍微修改了 "Xamarin.Cognitive.BingSpeech" 库(端点等)以使其工作。我有一些问题

我想按照 https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol 中的教程连接到 Microsoft Web 套接字。

我尝试使用基本 HttpClient 发送 HTTPREQUEST 并得到 101 切换协议结果(我想我成功了这部分?)。

更新:我的 HTTP 请求是:

System.Net.ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Ssl3;

var request = new HttpWebRequest(uriBuilder.Uri);
request.Headers.Add("Authorization", new System.Net.Http.Headers.AuthenticationHeaderValue(Bearer, AuthClient.Token).ToString());
request.Accept=MimeTypes.Json;
request.Host = SpeechEndpoint.Host;
request.Connection = "Upgrade";
request.Headers.Add("Upgrade", "Websocket");
request.KeepAlive = true;
request.Method = "GET";
request.CookieContainer = new CookieContainer();
request.AllowAutoRedirect = true;
request.Date = DateTime.Now;
request.CachePolicy = new System.Net.Cache.RequestCachePolicy(System.Net.Cache.RequestCacheLevel.CacheIfAvailable);
request.Headers.Add("Sec-WebSocket-Key", "dGhlIHNhbXBsZSBub25jZQ==");
request.Headers.Add("Sec-WebSocket-Version", "13");
request.Headers.Add("Sec-WebSocket-Protocol", "chat, superchat");
request.Headers.Add("X-ConnectionId",xConnectionId = Guid.NewGuid().ToString().ToUpper());

发出 HTTPRequest 后,我​​正在尝试连接到 websocket, 但我总是得到 "Unable to connect to remote server" 而没有任何错误代码或任何东西。 (wss://xxxxxxxx).

Uri wsuri = new Uri(AppConfig.BINGWSSURI);
await _socketclient.ConnectAsync(wsuri, CancellationToken.None);
Log.Info("WSOCKETFINISH", _socketclient.State.ToString());

我想要实现的第二件事是使用二进制消息将音频从麦克风流式传输到 websocket,所以我必须

  1. 麦克风录音(我用的是Plugin.AudioRecorder
  2. 切成小块
  3. 使用 websocket 异步传输小片段

我想要实现的目标:使用带有 Microsoft Cognitive Speech 的麦克风进行语音转文本,听写模式,因此我需要部分结果而不是等待录音完成。

我想你想将语音转换为文本。
因为 Xamarin.Cognitive.BingSpeech needs you to record the speech and send them as file or stream to the server. I think you could try to use Android speech. And it could also convert text to speech. Here 是一个例子。

如果你想使用Xamarin.Cognitive.BingSpeech,你可以使用Audio Recorder plugin录制语音并使用BingSpeechApiClient发送到服务器。例如:

BingSpeechApiClient bingSpeechClient = new BingSpeechApiClient ("My Bing Speech API Subscription Key");
var audioFile = "/a/path/to/my/audio/file/in/WAV/format.wav";
var simpleResult = await bingSpeechClient.SpeechToTextSimple (audioFile);
Or
var simpleResult = await bingSpeechClient.SpeechToTextSimple (stream, <sample rate>, <audio record Task>);

Here 是 Xamarin.Cognitive.BingSpeech 的示例。

更新:

I always get "Unable to connect to remote server" without any error code or anything.

您在 header 中缺少一些有价值的东西。

  1. X-ConnectionId
    您需要生成一个 UUID 并将其添加到 header。例如:client.Options.SetRequestHeader("X-ConnectionId", System.Guid.NewGuid().ToString());
  2. 授权
    您需要 post 您的 https://api.cognitive.microsoft.com/sts/v1.0/issueToken 订阅密钥。您可以使用 Postman 来执行此操作。然后在header.

    中添加return值

    client.Options.SetRequestHeader("Authorization", "eyJ0eXAiOiJKV1Q....uW72PAOBRcUvqY");

so I need partial result instead of waiting the recording to be completed

您可以使用 GetAudioFileStream() method.For 示例:

    var audioRecordTask = await recorder.StartRecording();
    using (var stream = recorder.GetAudioFileStream ())
    {
        //this will get the recording audio data as it continues to record
    }

更新2:
websoket部分代码:

    var client = new ClientWebSocket();
    client.Options.UseDefaultCredentials = true;
    client.Options.SetRequestHeader("X-ConnectionId", System.Guid.NewGuid().ToString());
    client.Options.SetRequestHeader("Authorization", "eyJ0eXAiOiJKV1QiL....16pbFPOWT3VHXot8");
    var a = client.ConnectAsync(new Uri("wss://speech.platform.bing.com/speech/recognition/Dictation/cognitiveservices/v1"), CancellationToken.None);
    a.Wait();

注意:保留您的授权值up-to-date。