与 Bash 相比,Java 中的区域设置区分大小写不一致

Inconsistent locale case sensitivity in Java compared to Bash

当我尝试用波兰语格式化日期时,我得到了一致的格式:

new SimpleDateFormat("EEEE", Locale.forLanguageTag("pl-PL")).format(new Date())

结果

wtorek

在bash中的相同结果:

LC_ALL=pl_PL
$ date +"%A %b %d"
wtorek maj 22

注意 wtorek.

中的两个小写 w

当我对捷克语执行此操作时,结果不一致:

new SimpleDateFormat("EEEE", Locale.forLanguageTag("cs-CZ").format(new Date())

结果

Pondělí

当 运行 在 bash:

$ LC_ALL=cs_CZ 
$ date +"%A %b %d"
pondělí kvě 21

注意 Java 结果中的大写字母 P。这是怎么发生的?这是否意味着 SimpleDateFormat 不使用系统上安装的标准语言环境?

Does it mean SimpleDateFormat doesn't use standard Locales installed on the system

是的,不使用系统语言环境,可用的语言环境取决于 JVM/JRE 供应商。例如检查 JRE 目录中的 lib\ext\localedata.jar。解压后可以找到文件:sun\text\resources\cs\FormatData_cs_CZ.class 反编译为:

public class FormatData_cs extends ParallelListResourceBundle
{
    @Override
    protected final Object[][] getContents() {
        return new Object[][] { { "MonthNames", 
        { "ledna", "\u00fanora", "b\u0159ezna", "dubna", "kv\u011btna", "\u010dervna", "\u010dervence", "srpna", "z\u00e1\u0159\u00ed", "\u0159\u00edjna", "listopadu", "prosince", "" } }, 
        { "standalone.MonthNames", { "leden", "\u00fanor", "b\u0159ezen", "duben", "kv\u011bten", "\u010derven", "\u010dervenec", "srpen", "z\u00e1\u0159\u00ed", "\u0159\u00edjen", "listopad", "prosinec", "" } }, 
        { "MonthAbbreviations", { "Led", "\u00dano", "B\u0159e", "Dub", "Kv\u011b", "\u010cer", "\u010cvc", "Srp", "Z\u00e1\u0159", "\u0158\u00edj", "Lis", "Pro", "" } }, 
        { "standalone.MonthAbbreviations", { "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX", "X", "XI", "XII", "" } }, 
        { "MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } },
        { "standalone.MonthNarrows", { "l", "\u00fa", "b", "d", "k", "\u010d", "\u010d", "s", "z", "\u0159", "l", "p", "" } }, 
        { "DayNames", { "Ned\u011ble", "Pond\u011bl\u00ed", "\u00dater\u00fd", "St\u0159eda", "\u010ctvrtek", "P\u00e1tek", "Sobota" } }, 
        { "standalone.DayNames", { "ned\u011ble", "pond\u011bl\u00ed", "\u00fater\u00fd", "st\u0159eda", "\u010dtvrtek", "p\u00e1tek", "sobota" } }, 
        { "DayAbbreviations", { "Ne", "Po", "\u00dat", "St", "\u010ct", "P\u00e1", "So" } }, 
        { "standalone.DayAbbreviations", { "ne", "po", "\u00fat", "st", "\u010dt", "p\u00e1", "so" } }, 
        { "DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } }, 
        { "standalone.DayNarrows", { "N", "P", "\u00da", "S", "\u010c", "P", "S" } },
        { "AmPmMarkers", { "dop.", "odp." } }, 
        { "Eras", { "p\u0159.Kr.", "po Kr." } }, 
        { "short.Eras", { "p\u0159. n. l.", "n. l." } }, 
        { "narrow.Eras", { "p\u0159.n.l.", "n. l." } }, 
        { "NumberElements", { ",", " ", ";", "%", "0", "#", "-", "E", "\u2030", "\u221e", "\ufffd" } }, 
        { "TimePatterns", { "H:mm:ss z", "H:mm:ss z", "H:mm:ss", "H:mm" } }, 
        { "DatePatterns", { "EEEE, d. MMMM yyyy", "d. MMMM yyyy", "d.M.yyyy", "d.M.yy" } }, 
        { "DateTimePatterns", { "{1} {0}" } }, 
        { "DateTimePatternChars", "GuMtkHmsSEDFwWahKzZ" } };
    }
}

并在 "DayNames" 中包含 "Pond\u011bl\u00ed"。

Java 从最多四个来源获取其语言环境数据(包括不同语言环境中星期几的名称)。是的,主机操作系统是其中之一,但不是默认的。引用 the LocaleServiceProvider documentation:

Java Runtime Environment provides the following four locale providers:

  • "CLDR": A provider based on Unicode Consortium's CLDR Project.
  • "COMPAT": represents the locale sensitive services that is compatible with the prior JDK releases up to JDK8 (same as JDK8's "JRE").
  • "SPI": represents the locale sensitive services implementing the subclasses of this LocaleServiceProvider class.
  • "HOST": A provider that reflects the user's custom settings in the underlying operating system. This provider may not be available, depending on the Java Runtime Environment implementation.
  • "JRE": represents a synonym to "COMPAT". This name is deprecated and will be removed in the future release of JDK.

最多 Java 8 JRE 是默认设置。我正在使用 java.time 因为没有人应该为过时的 SimpleDateFormat:

而烦恼
    DateTimeFormatter dayOfWeekFormatter 
            = DateTimeFormatter.ofPattern("EEEE", Locale.forLanguageTag("cs-CZ"));
    LocalDate date = LocalDate.now(ZoneId.of("Europe/Prague"));
    System.out.println(date.format(dayOfWeekFormatter));

我的 Oracle jdk1.8.0_131 上的输出 运行ning 与您的结果一致(大写 S):

Středa

我们可以控制通过系统使用的语言环境数据属性。例如,要更喜欢 CLDR,可以 运行 带有 VM 命令行选项 -Djava.locale.providers=CLDR,COMPAT 的程序,或者在程序开头插入以下行:

    System.setProperty("java.locale.providers", "CLDR,COMPAT");

středa

现在我们得到小写 s

我在 mac 上的 shellOS Sierra 10.12.6 只提供 Wednesday,所以显然我的 OS 没有捷克语言环境数据(这听起来很奇怪;可能问题出在其他地方),因此对我来说不是一个选择。您可以尝试将 HOST 放在上述语言环境提供程序字符串的前面,看看您得到的内容是否与您的 bash.

一致

在 Java 9 和更高版本中,CLDR 是默认值。所以 运行 在 jdk9.0.4 上使用相同的代码片段而不设置任何系统 属性 也给出了小写的星期几:

středa