什么是 String.Encoding.unicode?
What is String.Encoding.unicode?
Swift 提供了一系列字符串编码。在我写这篇文章的时候,none of them are documented,这使得这比应该的更加混乱......
我能理解.ascii
means it's ASCII encoded, .utf8
means the string is UTF-8 encoded, and .utf16BigEndian
means the string is UTF-16 but big-endian。这些显然映射到真实的文本编码。
然后是.unicode
. There is no "Unicode" encoding. The Unicode standard defines UTF-8, UTF-16, and UTF-32,正如我上面所说,已经在Swift中定义了。
算出最适合系统的那个是花哨的吗?它是 .utf8
的别名吗?是不是有些奇怪的 Apple Unicode 编码?
它似乎是 .utf16
的别名。来自 CFString.h
:
#define kCFStringEncodingInvalidId (0xffffffffU)
typedef CF_ENUM(CFStringEncoding, CFStringBuiltInEncodings) {
kCFStringEncodingMacRoman = 0,
kCFStringEncodingWindowsLatin1 = 0x0500, /* ANSI codepage 1252 */
kCFStringEncodingISOLatin1 = 0x0201, /* ISO 8859-1 */
kCFStringEncodingNextStepLatin = 0x0B01, /* NextStep encoding*/
kCFStringEncodingASCII = 0x0600, /* 0..127 (in creating CFString, values greater than 0x7F are treated as corresponding Unicode value) */
kCFStringEncodingUnicode = 0x0100, /* kTextEncodingUnicodeDefault + kTextEncodingDefaultFormat (aka kUnicode16BitFormat) */
kCFStringEncodingUTF8 = 0x08000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF8Format */
kCFStringEncodingNonLossyASCII = 0x0BFF, /* 7bit Unicode variants used by Cocoa & Java */
kCFStringEncodingUTF16 = 0x0100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16Format (alias of kCFStringEncodingUnicode) */
kCFStringEncodingUTF16BE = 0x10000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16BEFormat */
kCFStringEncodingUTF16LE = 0x14000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16LEFormat */
kCFStringEncodingUTF32 = 0x0c000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32Format */
kCFStringEncodingUTF32BE = 0x18000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32BEFormat */
kCFStringEncodingUTF32LE = 0x1c000100 /* kTextEncodingUnicodeDefault + kUnicodeUTF32LEFormat */
};
您可以通过以下方式确认:
print(String.Encoding.unicode.rawValue, String.Encoding.utf16.rawValue)
Swift 提供了一系列字符串编码。在我写这篇文章的时候,none of them are documented,这使得这比应该的更加混乱......
我能理解.ascii
means it's ASCII encoded, .utf8
means the string is UTF-8 encoded, and .utf16BigEndian
means the string is UTF-16 but big-endian。这些显然映射到真实的文本编码。
然后是.unicode
. There is no "Unicode" encoding. The Unicode standard defines UTF-8, UTF-16, and UTF-32,正如我上面所说,已经在Swift中定义了。
算出最适合系统的那个是花哨的吗?它是 .utf8
的别名吗?是不是有些奇怪的 Apple Unicode 编码?
它似乎是 .utf16
的别名。来自 CFString.h
:
#define kCFStringEncodingInvalidId (0xffffffffU)
typedef CF_ENUM(CFStringEncoding, CFStringBuiltInEncodings) {
kCFStringEncodingMacRoman = 0,
kCFStringEncodingWindowsLatin1 = 0x0500, /* ANSI codepage 1252 */
kCFStringEncodingISOLatin1 = 0x0201, /* ISO 8859-1 */
kCFStringEncodingNextStepLatin = 0x0B01, /* NextStep encoding*/
kCFStringEncodingASCII = 0x0600, /* 0..127 (in creating CFString, values greater than 0x7F are treated as corresponding Unicode value) */
kCFStringEncodingUnicode = 0x0100, /* kTextEncodingUnicodeDefault + kTextEncodingDefaultFormat (aka kUnicode16BitFormat) */
kCFStringEncodingUTF8 = 0x08000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF8Format */
kCFStringEncodingNonLossyASCII = 0x0BFF, /* 7bit Unicode variants used by Cocoa & Java */
kCFStringEncodingUTF16 = 0x0100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16Format (alias of kCFStringEncodingUnicode) */
kCFStringEncodingUTF16BE = 0x10000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16BEFormat */
kCFStringEncodingUTF16LE = 0x14000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF16LEFormat */
kCFStringEncodingUTF32 = 0x0c000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32Format */
kCFStringEncodingUTF32BE = 0x18000100, /* kTextEncodingUnicodeDefault + kUnicodeUTF32BEFormat */
kCFStringEncodingUTF32LE = 0x1c000100 /* kTextEncodingUnicodeDefault + kUnicodeUTF32LEFormat */
};
您可以通过以下方式确认:
print(String.Encoding.unicode.rawValue, String.Encoding.utf16.rawValue)