与 Bash 相比,Java 中的区域设置区分大小写不一致
Inconsistent locale case sensitivity in Java compared to Bash
当我尝试用波兰语格式化日期时,我得到了一致的格式:
new SimpleDateFormat("EEEE", Locale.forLanguageTag("pl-PL")).format(new Date())
结果
wtorek
在bash中的相同结果:
LC_ALL=pl_PL
$ date +"%A %b %d"
wtorek maj 22
注意 wtorek
.
中的两个小写 w
当我对捷克语执行此操作时,结果不一致:
new SimpleDateFormat("EEEE", Locale.forLanguageTag("cs-CZ").format(new Date())
结果
Pondělí
当 运行 在 bash:
$ LC_ALL=cs_CZ
$ date +"%A %b %d"
pondělí kvě 21
注意 Java 结果中的大写字母 P
。这是怎么发生的?这是否意味着 SimpleDateFormat
不使用系统上安装的标准语言环境?
Does it mean SimpleDateFormat doesn't use standard Locales installed on the system
是的,不使用系统语言环境,可用的语言环境取决于 JVM/JRE 供应商。例如检查 JRE 目录中的 lib\ext\localedata.jar
。解压后可以找到文件:sun\text\resources\cs\FormatData_cs_CZ.class
反编译为:
public class FormatData_cs extends ParallelListResourceBundle
{
@Override
protected final Object[][] getContents() {
return new Object[][] { { "MonthNames",
{ "ledna", "\u00fanora", "b\u0159ezna", "dubna", "kv\u011btna", "\u010dervna", "\u010dervence", "srpna", "z\u00e1\u0159\u00ed", "\u0159\u00edjna", "listopadu", "prosince", "" } },
{ "standalone.MonthNames", { "leden", "\u00fanor", "b\u0159ezen", "duben", "kv\u011bten", "\u010derven", "\u010dervenec", "srpen", "z\u00e1\u0159\u00ed", "\u0159\u00edjen", "listopad", "prosinec", "" } },
{ "MonthAbbreviations", { "Led", "\u00dano", "B\u0159e", "Dub", "Kv\u011b", "\u010cer", "\u010cvc", "Srp", "Z\u00e1\u0159", "\u0158\u00edj", "Lis", "Pro", "" } },
{ "standalone.MonthAbbreviations", { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "" } },
{ "MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } },
{ "standalone.MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } },
{ "DayNames", { "Ned\u011ble", "Pond\u011bl\u00ed", "\u00dater\u00fd", "St\u0159eda", "\u010ctvrtek", "P\u00e1tek", "Sobota" } },
{ "standalone.DayNames", { "ned\u011ble", "pond\u011bl\u00ed", "\u00fater\u00fd", "st\u0159eda", "\u010dtvrtek", "p\u00e1tek", "sobota" } },
{ "DayAbbreviations", { "Ne", "Po", "\u00dat", "St", "\u010ct", "P\u00e1", "So" } },
{ "standalone.DayAbbreviations", { "ne", "po", "\u00fat", "st", "\u010dt", "p\u00e1", "so" } },
{ "DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } },
{ "standalone.DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } },
{ "AmPmMarkers", { "dop.", "odp." } },
{ "Eras", { "p\u0159.Kr.", "po Kr." } },
{ "short.Eras", { "p\u0159. n. l.", "n. l." } },
{ "narrow.Eras", { "p\u0159.n.l.", "n. l." } },
{ "NumberElements", { ",", " ", ";", "%", "0", "#", "-", "E", "\u2030", "\u221e", "\ufffd" } },
{ "TimePatterns", { "H:mm:ss z", "H:mm:ss z", "H:mm:ss", "H:mm" } },
{ "DatePatterns", { "EEEE, d. MMMM yyyy", "d. MMMM yyyy", "d.M.yyyy", "d.M.yy" } },
{ "DateTimePatterns", { "{1} {0}" } },
{ "DateTimePatternChars", "GuMtkHmsSEDFwWahKzZ" } };
}
}
并在 "DayNames" 中包含 "Pond\u011bl\u00ed"。
Java 从最多四个来源获取其语言环境数据(包括不同语言环境中星期几的名称)。是的,主机操作系统是其中之一,但不是默认的。引用 the LocaleServiceProvider
documentation:
Java Runtime Environment provides the following four locale providers:
- "CLDR": A provider based on Unicode Consortium's CLDR Project.
- "COMPAT": represents the locale sensitive services that is compatible with the prior JDK releases up to JDK8 (same as JDK8's
"JRE").
- "SPI": represents the locale sensitive services implementing the subclasses of this LocaleServiceProvider class.
- "HOST": A provider that reflects the user's custom settings in the underlying operating system. This provider may not be
available, depending on the Java Runtime Environment
implementation.
- "JRE": represents a synonym to "COMPAT". This name is deprecated and will be removed in the future release of JDK.
最多 Java 8 JRE 是默认设置。我正在使用 java.time
因为没有人应该为过时的 SimpleDateFormat
:
而烦恼
DateTimeFormatter dayOfWeekFormatter
= DateTimeFormatter.ofPattern("EEEE", Locale.forLanguageTag("cs-CZ"));
LocalDate date = LocalDate.now(ZoneId.of("Europe/Prague"));
System.out.println(date.format(dayOfWeekFormatter));
我的 Oracle jdk1.8.0_131 上的输出 运行ning 与您的结果一致(大写 S
):
Středa
我们可以控制通过系统使用的语言环境数据属性。例如,要更喜欢 CLDR,可以 运行 带有 VM 命令行选项 -Djava.locale.providers=CLDR,COMPAT
的程序,或者在程序开头插入以下行:
System.setProperty("java.locale.providers", "CLDR,COMPAT");
středa
现在我们得到小写 s
。
我在 mac 上的 shellOS Sierra 10.12.6 只提供 Wednesday
,所以显然我的 OS 没有捷克语言环境数据(这听起来很奇怪;可能问题出在其他地方),因此对我来说不是一个选择。您可以尝试将 HOST
放在上述语言环境提供程序字符串的前面,看看您得到的内容是否与您的 bash
.
一致
在 Java 9 和更高版本中,CLDR 是默认值。所以 运行 在 jdk9.0.4 上使用相同的代码片段而不设置任何系统 属性 也给出了小写的星期几:
středa
当我尝试用波兰语格式化日期时,我得到了一致的格式:
new SimpleDateFormat("EEEE", Locale.forLanguageTag("pl-PL")).format(new Date())
结果
wtorek
在bash中的相同结果:
LC_ALL=pl_PL
$ date +"%A %b %d"
wtorek maj 22
注意 wtorek
.
w
当我对捷克语执行此操作时,结果不一致:
new SimpleDateFormat("EEEE", Locale.forLanguageTag("cs-CZ").format(new Date())
结果
Pondělí
当 运行 在 bash:
$ LC_ALL=cs_CZ
$ date +"%A %b %d"
pondělí kvě 21
注意 Java 结果中的大写字母 P
。这是怎么发生的?这是否意味着 SimpleDateFormat
不使用系统上安装的标准语言环境?
Does it mean SimpleDateFormat doesn't use standard Locales installed on the system
是的,不使用系统语言环境,可用的语言环境取决于 JVM/JRE 供应商。例如检查 JRE 目录中的 lib\ext\localedata.jar
。解压后可以找到文件:sun\text\resources\cs\FormatData_cs_CZ.class
反编译为:
public class FormatData_cs extends ParallelListResourceBundle
{
@Override
protected final Object[][] getContents() {
return new Object[][] { { "MonthNames",
{ "ledna", "\u00fanora", "b\u0159ezna", "dubna", "kv\u011btna", "\u010dervna", "\u010dervence", "srpna", "z\u00e1\u0159\u00ed", "\u0159\u00edjna", "listopadu", "prosince", "" } },
{ "standalone.MonthNames", { "leden", "\u00fanor", "b\u0159ezen", "duben", "kv\u011bten", "\u010derven", "\u010dervenec", "srpen", "z\u00e1\u0159\u00ed", "\u0159\u00edjen", "listopad", "prosinec", "" } },
{ "MonthAbbreviations", { "Led", "\u00dano", "B\u0159e", "Dub", "Kv\u011b", "\u010cer", "\u010cvc", "Srp", "Z\u00e1\u0159", "\u0158\u00edj", "Lis", "Pro", "" } },
{ "standalone.MonthAbbreviations", { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "" } },
{ "MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } },
{ "standalone.MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } },
{ "DayNames", { "Ned\u011ble", "Pond\u011bl\u00ed", "\u00dater\u00fd", "St\u0159eda", "\u010ctvrtek", "P\u00e1tek", "Sobota" } },
{ "standalone.DayNames", { "ned\u011ble", "pond\u011bl\u00ed", "\u00fater\u00fd", "st\u0159eda", "\u010dtvrtek", "p\u00e1tek", "sobota" } },
{ "DayAbbreviations", { "Ne", "Po", "\u00dat", "St", "\u010ct", "P\u00e1", "So" } },
{ "standalone.DayAbbreviations", { "ne", "po", "\u00fat", "st", "\u010dt", "p\u00e1", "so" } },
{ "DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } },
{ "standalone.DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } },
{ "AmPmMarkers", { "dop.", "odp." } },
{ "Eras", { "p\u0159.Kr.", "po Kr." } },
{ "short.Eras", { "p\u0159. n. l.", "n. l." } },
{ "narrow.Eras", { "p\u0159.n.l.", "n. l." } },
{ "NumberElements", { ",", " ", ";", "%", "0", "#", "-", "E", "\u2030", "\u221e", "\ufffd" } },
{ "TimePatterns", { "H:mm:ss z", "H:mm:ss z", "H:mm:ss", "H:mm" } },
{ "DatePatterns", { "EEEE, d. MMMM yyyy", "d. MMMM yyyy", "d.M.yyyy", "d.M.yy" } },
{ "DateTimePatterns", { "{1} {0}" } },
{ "DateTimePatternChars", "GuMtkHmsSEDFwWahKzZ" } };
}
}
并在 "DayNames" 中包含 "Pond\u011bl\u00ed"。
Java 从最多四个来源获取其语言环境数据(包括不同语言环境中星期几的名称)。是的,主机操作系统是其中之一,但不是默认的。引用 the LocaleServiceProvider
documentation:
Java Runtime Environment provides the following four locale providers:
- "CLDR": A provider based on Unicode Consortium's CLDR Project.
- "COMPAT": represents the locale sensitive services that is compatible with the prior JDK releases up to JDK8 (same as JDK8's "JRE").
- "SPI": represents the locale sensitive services implementing the subclasses of this LocaleServiceProvider class.
- "HOST": A provider that reflects the user's custom settings in the underlying operating system. This provider may not be available, depending on the Java Runtime Environment implementation.
- "JRE": represents a synonym to "COMPAT". This name is deprecated and will be removed in the future release of JDK.
最多 Java 8 JRE 是默认设置。我正在使用 java.time
因为没有人应该为过时的 SimpleDateFormat
:
DateTimeFormatter dayOfWeekFormatter
= DateTimeFormatter.ofPattern("EEEE", Locale.forLanguageTag("cs-CZ"));
LocalDate date = LocalDate.now(ZoneId.of("Europe/Prague"));
System.out.println(date.format(dayOfWeekFormatter));
我的 Oracle jdk1.8.0_131 上的输出 运行ning 与您的结果一致(大写 S
):
Středa
我们可以控制通过系统使用的语言环境数据属性。例如,要更喜欢 CLDR,可以 运行 带有 VM 命令行选项 -Djava.locale.providers=CLDR,COMPAT
的程序,或者在程序开头插入以下行:
System.setProperty("java.locale.providers", "CLDR,COMPAT");
středa
现在我们得到小写 s
。
我在 mac 上的 shellOS Sierra 10.12.6 只提供 Wednesday
,所以显然我的 OS 没有捷克语言环境数据(这听起来很奇怪;可能问题出在其他地方),因此对我来说不是一个选择。您可以尝试将 HOST
放在上述语言环境提供程序字符串的前面,看看您得到的内容是否与您的 bash
.
在 Java 9 和更高版本中,CLDR 是默认值。所以 运行 在 jdk9.0.4 上使用相同的代码片段而不设置任何系统 属性 也给出了小写的星期几:
středa