C: 为 Windows 编译时无法将 Unicode 盲文写入 UTF-8 文档

Question

我有一些代码在 Linux 上工作得很好，但在 Windows 上它只有在使用模拟 Linux 环境的 Cygwin 编译时才能按预期工作。在 Windows 上，但不利于可移植性（您必须安装 Cygwin 才能使编译后的二进制文件工作。）该程序执行以下操作：

以读取模式和 ccs=UTF-8 打开文档并逐字符读取。
写入对应于该字母 num 的盲文 Unicode 模式 (U+2800..U+28FF)。或点。标记为 'dest' 文档（以写入模式和 ccs=UTF-8 打开）

重要代码：

const char *brai[26] = {
    "⠁","⠃","⠉","⠙","⠑","⠋","⠛","⠓","⠊","⠚",
    "⠅","⠇","⠍","⠝","⠕","⠏","⠟","⠗","⠎","⠞",
    "⠥","⠧","⠭","⠽","⠵","⠺"
}

int main(void) {

    setlocale(LC_ALL, "es_MX.UTF-8");
    FILE *source = fopen(origen, "r, ccs=UTF-8");
    FILE *dest = fopen(destino, "w, ccs=UTF-8");

    unsigned int letra;
    while ((letra = fgetc(source)) != EOF) {

        // This next line is the problem, I guess:
        fwprintf(dest, L"%s", "⠷"); // Prints directly the braille sign as a char[]
        // OR prints it from an array that contains the exact same sign.
        fwprintf(dest, L"%s", brai[7]);

    }
}

代码每次都在 Linux 上按预期工作，但在 Windows 上却不行。我尝试了一切，但似乎没有什么能使输出正确。在 'dest' 文档中，我得到如下随机字符：甥╩极肠─猀甥iꃢ¨.

到目前为止，在 Windows 上将盲文模式打印到文档的唯一方法是：

fwprintf(dest, L"⠷");

这不是很有用（需要为每个案例制作一个 'else if'）。如果您想查看完整代码，请访问 Github： https://github.com/oliver-almaraz/Texto_a_Braille

到目前为止我尝试了什么：

将文件打开选项更改为 UTF-16LE 和 UNICODE。
以我能想到的所有方式更改 fwprintf() 参数。
将包含盲文模式的数组的数组属性更改为无符号整数。
不同的编译器。

Answer 1

选项 1：使用 wchar_t 和 fwprintf。确保将源代码保存为带 BOM 编码的 UTF-8 或使用 UTF-8 编码和 /utf-8 开关以强制假定 Microsoft 编译器采用 UTF-8 编码；否则，MSVS 假定源文件使用 ANSI 编码，您会得到 mojibake。

#include <stdio.h>

const wchar_t brai[] = L"⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";

int main(void) {
    FILE *dest = fopen("out.txt", "w, ccs=UTF-8");
    fwprintf(dest, L"%s", brai);
}

out.txt（编码为带 BOM 的 UTF-8）：

⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺

选项 2：使用 char 和 fprintf，将源代码保存为 UTF-8 或带 BOM 的 UTF-8，并使用 /utf-8 Microsoft 编译开关。 char 字符串将采用源编码，因此它必须是 UTF-8 才能在输出文件中获得 UTF-8。

#include <stdio.h>

const char brai[] = "⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";

int main(void) {
    FILE *dest = fopen("out.csv","w");
    fprintf(dest, "%s", brai);
}

最新的编译器也可以使用u8""语法。这里的优点是您可以使用不同的源编码，只要您使用适当的编译器开关来指示源编码，char 字符串仍将是 UTF-8。

const char brai[] = u8"⠁⠃⠉⠙⠑⠋⠛⠓⠊⠚⠅⠇⠍⠝⠕⠏⠟⠗⠎⠞⠥⠧⠭⠽⠵⠺";

作为参考，这些是 Microsoft 编译器选项：

/source-charset:<iana-name>|.nnnn set source character set
/execution-charset:<iana-name>|.nnnn set execution character set
/utf-8 set source and execution character set to UTF-8

Answer 2

这是一个经过测试的（在 Windows 上使用 MSVC 和 mingw），semi-working 示例。

#include <stdio.h>
#include <ctype.h>

const char *brai[26] = {
    "⠁","⠃","⠉","⠙","⠑","⠋","⠛","⠓","⠊","⠚",
    "⠅","⠇","⠍","⠝","⠕","⠏","⠟","⠗","⠎","⠞",
    "⠥","⠧","⠭","⠽","⠵","⠺"
};

int main(void) {
    
    char* origen = "a.txt";
    char* destino = "b.txt";

    FILE *source = fopen(origen, "r");
    FILE *dest = fopen(destino, "w");
    
    int letra;
    while ((letra = fgetc(source)) != EOF) {
        
        if (isupper(letra))
            fprintf(dest, "%s", brai[letra - 'A']);
        else if (islower(letra))
            fprintf(dest, "%s", brai[letra - 'a']);
        else
            fprintf (dest, "%c", letra);
    }
}

注意这些事情。

看不到语言环境或宽字符或类似的东西。需要 None 个。
此代码仅翻译英文字母。没有标点符号或数字（我对盲文知之甚少，无法添加，但这应该很简单）。
由于代码只翻译英文字母而其他所有内容保持原样，因此可以为其提供 UTF-8 编码文件。它只会留下无法识别的字符未翻译。如果您需要翻译带重音符号的字母，则需要学习更多有关 Unicode 的知识。 Here is a good place to start.
为简洁起见省略了错误处理。
代码必须使用正确的字符集。对于 MSVC，UTF-8 with BOM 或 UTF16，或者使用不带 BOM 的 UTF-8 和 /utf-8 编译器开关（如果您的 MSVC 版本可以识别它）。对于 mingw，只需使用 UTF-8。
此方法不适用于 Windows 上的标准控制台输出。这不是什么大问题，因为 Windows 控制台默认不会输出盲文字符。但是，它适用于 msys 控制台和许多其他控制台。

C: 为 Windows 编译时无法将 Unicode 盲文写入 UTF-8 文档

C: Cannot write Unicode braille to UTF-8 doc when compiling for Windows

c

unicode

utf-8