使用自定义语音协议的 Microsoft 语音识别(Xamarin Android、Websocket)

Microsoft Speech Recognition with Custom Speech Protocol (Xamarin Android, Websocket)

我正在尝试使用 Microsoft Cognitive Speech for Xamarin Android 从麦克风构建连续语音识别。我不认为 Xamarin 有图书馆。文档是:https://docs.microsoft.com/en-us/azure/cognitive-services/speech/api-reference-rest/websocketprotocol

我已经完成了 websocket 连接的事情,现在我很难将消息发送到 websocket 服务器。我在文档中注意到

We have to send Headers on a specific Path everytime we send a Message

例如,这些 header 用于设置语音协议的第一个配置,

Path : speech.config
X-Timestamp :   Client UTC clock time stamp in ISO 8601 format
Content-Type :  application/json; charset=utf-8

我正在使用 WebSocketClient,但我找不到任何方法来设置 header 或更改路径。有什么方法可以设置 headers and/or 更改路径以便我可以将消息正确地发送到服务器吗?还是我认知有误?

我的第二个问题是 WebSocketClient 没有任何事件处理程序来接收消息,我所做的是:

private static async Task DataReceiving(ClientWebSocket ws)
{
while (true)
{
   ArraySegment<byte> bytesReceived = new ArraySegment<byte>(new byte[1024]);
   WebSocketReceiveResult result = await ws.ReceiveAsync(
   bytesReceived, CancellationToken.None);
   Log.Info("SOCKETRECEIVED",Encoding.UTF8.GetString(bytesReceived.Array, 0, result.Count));
   if (ws.State != WebSocketState.Open)
   {
       Log.Info("SOCKETCLOSED", "CLOSED");
       break;
   }
}
}

但我没有收到任何消息或任何东西。

编辑:

这是我的代码 Headers,

//List<Tuple<string, string>> Headers <<Contains [Title] and [Content]
foreach (var item in Headers)
{
    message += item.Item1 + " : " + item.Item2 + Environment.NewLine;
}
message += Environment.Newline; // ensure double carriage return

编辑: 这是我发送 WAV Header 的代码:

using (MemoryStream stream = new MemoryStream())
{
    short channelCount = 1;
    int sampleRate = 1024;
    int bitsPerSample = 16;
    using (var writer = new BinaryWriter(stream, Encoding.UTF8))
    {


        writer.Write("Path: audio"+Environment.NewLine);
        writer.Write("X-Timestamp: " + DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ss.fffffffZ"+Environment.NewLine));
        writer.Write("Content-Type : audio/x-wav"+Environment.NewLine);
        writer.Write("X-RequestId: " + Guid.NewGuid().ToString().Replace("-",string.Empty)+Environment.NewLine);
        writer.Write(Environment.NewLine);

        //chunk ID
        writer.Write('R');
        writer.Write('I');
        writer.Write('F');
        writer.Write('F');

        writer.Write(-1); // -1 - Unknown size

        //format
        writer.Write('W');
        writer.Write('A');
        writer.Write('V');
        writer.Write('E');

        //subchunk 1 ID
        writer.Write('f');
        writer.Write('m');
        writer.Write('t');
        writer.Write(' ');

        writer.Write(16); //subchunk 1 (fmt) size
        writer.Write((short)1); //PCM audio format

        writer.Write((short)channelCount);
        writer.Write(sampleRate);
        writer.Write(sampleRate * 2);
        writer.Write((short)2); //block align
        writer.Write((short)bitsPerSample);

        //subchunk 2 ID
        writer.Write('d');
        writer.Write('a');
        writer.Write('t');
        writer.Write('a');

        //subchunk 2 (data) size
        writer.Write(-1); // -1 - Unknown size
    }
    byte[] result;
    //using (MemoryStream ms = new MemoryStream())
    //{
    //    stream.CopyTo(ms);
    //    result = ms.ToArray();
    //}
    result = stream.ToArray();
    ArraySegment<byte> byteresult = new ArraySegment<byte>(result);
    await _socketclient.SendAsync(byteresult, WebSocketMessageType.Binary, false, CancellationToken.None);
    Log.Info("SENDINGWAV", System.Text.Encoding.UTF8.GetString(result));
}

这是我发送数据字节的代码,

public async Task SendByteHeader(byte[] data)
{
        string s = "";
        s+=("Path: audio" + Environment.NewLine);
        s +=("X-Timestamp: " + DateTime.UtcNow.ToString("yyyy-MM-ddTHH:mm:ss.fffffffZ" + Environment.NewLine));
        s +=("Content-Type : audio/x-wav" + Environment.NewLine);
        s +=("X-RequestId: " + Guid.NewGuid().ToString().Replace("-", string.Empty) + Environment.NewLine);
        s +=(Environment.NewLine);
        byte[] array = Encoding.UTF8.GetBytes(s);
        List<byte> endres = new List<byte>(array);
        endres.AddRange(data);

        ArraySegment<byte> byteresult = new ArraySegment<byte>(endres.ToArray());
        await _socketclient.SendAsync(byteresult, WebSocketMessageType.Binary, false, CancellationToken.None);
        Log.Info("SENDINGBYTE", Encoding.UTF8.GetString(data));
  }

我在连接开始时 运行 :

Task.Run(()=>DataReceiving(_socketclient));

所以,我先发送 Wav header,然后开始发送录音中的音频字节(我使用的是 Plugin.AudioRecording)。 我还没有收到任何消息/回复。

编辑 :

我每 200 毫秒向服务器发送一些数据以使其成为 "real time",但我注意到在发送 5-6 次后,我的所有 SendAsync 都崩溃在此代码上:

await _socketclient.SendAsync(byteresult, WebSocketMessageType.Binary, false, CancellationToken.None);

错误是"Cannot access disposable object (the websocket)) "。似乎 websocket 被处理掉了?或者连接终止了?

I am using WebSocketClient but I don't find any way to set up headers or change path. Is there any way to set up the headers and/or changing path so I can send message properly to the server? Or do I have a wrong perception?

如果您参考您发布的文档的 TextWebSocket Message 部分。您可以找到以下语句:

Text WebSocket messages carry a payload of textual information that consists of a section of headers and a body separated by the familiar double-carriage-return newline pair used for HTTP messages.

这意味着,您使用 client.SendAsync() 发送到服务的消息可以由两部分组成:header 部分和 body 部分,两部分由 [= 分隔11=].

My second problem is WebSocketClient doesnt have any event handler to receive message

关于这个问题,你做的是对的,你可以等你发消息正确后再试。该服务将发回它识别出的文字消息。