如何正确地将十六进制转义添加到字符串文字中?
How to properly add hex escapes into a string-literal?
当你在C中有字符串时,你可以直接在里面添加十六进制代码。
char str[] = "abcde"; // 'a', 'b', 'c', 'd', 'e', 0x00
char str2[] = "abc\x12\x34"; // 'a', 'b', 'c', 0x12, 0x34, 0x00
两个例子在内存中都有 6 个字节。现在,如果您想在输入十六进制后添加值 [a-fA-F0-9]
,就会出现问题。
//I want: 'a', 'b', 'c', 0x12, 'e', 0x00
//Error, hex is too big because last e is treated as part of hex thus becoming 0x12e
char problem[] = "abc\x12e";
可能的解决方法是在定义后替换。
//This will work, bad idea
char solution[6] = "abcde";
solution[3] = 0x12;
这可以工作,但如果你把它写成 const
,它会失败。
//This will not work
const char solution[6] = "abcde";
solution[3] = 0x12; //Compilation error!
如何在\x12
之后正确插入e
而不触发错误?
我为什么要问?当您想将 UTF-8 字符串构建为常量时,如果它大于 ASCII table 可以容纳的字符,则必须使用字符的十六进制值。
使用 3 个八进制数字:
char problem[] = "abc2e";
或拆分您的字符串:
char problem[] = "abc\x12" "e";
为什么这些有效:
与十六进制转义不同,标准将 3 位数字定义为八进制转义的最大数量。
6.4.4.4 Character constants
...
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
...
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
字符串文字连接被定义为比文字转义字符转换晚的翻译阶段。
5.1.1.2 Translation phases
...
Each source character set member and escape sequence in character constants and
string literals is converted to the corresponding member of the execution character
set; if there is no corresponding member, it is converted to an implementation-
defined member other than the null (wide) character. 8)
Adjacent string literal tokens are concatenated.
由于字符串文字在编译过程的早期就被连接起来,但是 在 转义字符转换之后,您可以只使用:
char problem[] = "abc\x12" "e";
尽管为了便于阅读,您可能更喜欢完全分离:
char problem[] = "abc" "\x12" "e";
对于我们当中的语言律师,C11 5.1.1.2 Translation phases
(我强调)中涵盖了这一点:
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.
Adjacent string literal tokens are concatenated.
Why I'm asking? When you want to build UTF-8 string as constant, you have to use hex values of character is larger than ASCII table can hold.
嗯,不。您 不必 。从 C11 开始,您可以在字符串常量前加上 u8
,这告诉编译器字符文字是 UTF-8。
char solution[] = u8"no need to use hex-codes áé§µ";
(顺便说一下,C++11 也支持同样的事情)
当你在C中有字符串时,你可以直接在里面添加十六进制代码。
char str[] = "abcde"; // 'a', 'b', 'c', 'd', 'e', 0x00
char str2[] = "abc\x12\x34"; // 'a', 'b', 'c', 0x12, 0x34, 0x00
两个例子在内存中都有 6 个字节。现在,如果您想在输入十六进制后添加值 [a-fA-F0-9]
,就会出现问题。
//I want: 'a', 'b', 'c', 0x12, 'e', 0x00
//Error, hex is too big because last e is treated as part of hex thus becoming 0x12e
char problem[] = "abc\x12e";
可能的解决方法是在定义后替换。
//This will work, bad idea
char solution[6] = "abcde";
solution[3] = 0x12;
这可以工作,但如果你把它写成 const
,它会失败。
//This will not work
const char solution[6] = "abcde";
solution[3] = 0x12; //Compilation error!
如何在\x12
之后正确插入e
而不触发错误?
我为什么要问?当您想将 UTF-8 字符串构建为常量时,如果它大于 ASCII table 可以容纳的字符,则必须使用字符的十六进制值。
使用 3 个八进制数字:
char problem[] = "abc2e";
或拆分您的字符串:
char problem[] = "abc\x12" "e";
为什么这些有效:
与十六进制转义不同,标准将 3 位数字定义为八进制转义的最大数量。
6.4.4.4 Character constants
...
octal-escape-sequence: \ octal-digit \ octal-digit octal-digit \ octal-digit octal-digit octal-digit
...
hexadecimal-escape-sequence: \x hexadecimal-digit hexadecimal-escape-sequence hexadecimal-digit
字符串文字连接被定义为比文字转义字符转换晚的翻译阶段。
5.1.1.2 Translation phases
...
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation- defined member other than the null (wide) character. 8)
Adjacent string literal tokens are concatenated.
由于字符串文字在编译过程的早期就被连接起来,但是 在 转义字符转换之后,您可以只使用:
char problem[] = "abc\x12" "e";
尽管为了便于阅读,您可能更喜欢完全分离:
char problem[] = "abc" "\x12" "e";
对于我们当中的语言律师,C11 5.1.1.2 Translation phases
(我强调)中涵盖了这一点:
Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.
Adjacent string literal tokens are concatenated.
Why I'm asking? When you want to build UTF-8 string as constant, you have to use hex values of character is larger than ASCII table can hold.
嗯,不。您 不必 。从 C11 开始,您可以在字符串常量前加上 u8
,这告诉编译器字符文字是 UTF-8。
char solution[] = u8"no need to use hex-codes áé§µ";
(顺便说一下,C++11 也支持同样的事情)