测试因意外 Collections.sort 行为而失败

Test failing for unexpected Collections.sort behavior

请注意: 我在这里提到了 JUnit 并提供了一个使用它的 SSCCE 代码示例,但这本质上是一个 Java 集合问题,可以由任何有 Java 经验的人回答,无论他们是否使用过 JUnit。


Java 8 在这里,我正在尝试对字符串列表进行排序,但是 Collections.sort(myList) 出现了一些意外行为,我想知道发生了什么。

这是我的完整单元测试:

@RunWith(MockitoJUnitRunner.class)
public class SorterTest {

    @Test
    public void should_sort_correctly_including_capitalization_rules() {

        // given
        String[] actualNames = new String[] {
            "DCME",
            "CCME",
            "ACME",
            "BCME",
            "AGME",
            "AACME",
            "aCME",
            "Acme",
            "AaCME",
            "aACME",
        };
        List<String> actual = Arrays.asList(actualNames);

        // the order I would *expect* them to sort into...
        String[] expectedNames = new String[] {
                "aACME",
                "aCME",
                "AaCME",
                "AACME",
                "Acme",
                "ACME",
                "AGME",
                "BCME",
                "CCME",
                "DCME"
        };
        List<String> expected = Arrays.asList(expectedNames);

        // when
        Collections.sort(actual);

        // then
        assertTrue(actual.equals(expected));

    }

}

此处的 JUnit assertTrue 在运行时失败,因为 actual 列表被分类为:

0 = "AACME"
1 = "ACME"
2 = "AGME"
3 = "AaCME"
4 = "Acme"
5 = "BCME"
6 = "CCME"
7 = "DCME"
8 = "aACME"
9 = "aCME"

这就是 ^^^ 调试器输出,数字代表每个元素的列表索引。

所以出于某种原因 Collections.sort 是说字符串“BCME”在字典上比“aCME”“低”(在排序列表中出现得更早),这对我来说简直是疯了。 :-)

我应该提一下,我在这里只处理 UTF-8 中的 ASCII 字符,但我的应用程序将执行预验证,以确保每个 string/name 中的所有字符都在 [a-z][A-Z].

无论哪种方式,我正在寻找要使用的 Java 代码的排序规则是:

根据这些排序规则,我的单元测试列表应排序为:

Sort Order   Reason why it comes after the last one in the list
================================================================
aACME        
aCME         1st letter is 'a' but 2nd letter is 'C' and A < C
AaCME        1st letter is 'A' and a < A
AACME        1st letter is 'A' and 2nd letter is 'A' and a < A
Acme         1st letter is 'A' but 2nd letter is 'c' and A < c
ACME         1st letter is 'A' but 2nd letter is 'C' and c < C
AGME         1st letter is 'A' but 2nd letter is 'G' and C < G
BCME         1st letter is 'B' and aA < bB
CCME         1st letter is 'C' and bB < cC
DCME         1st letter is 'D' and cC < dD

如何更改上面的代码,以便单元测试通过并且列表按我需要的方式排序?

写一个比较器来按照你想要的方式对事物进行排序。我们不会为您编写它,但是将比较器 map/translate 字符串放入相应的排序键中应该很简单...

例如,假设只有[A-Za-z] map

a->0x00
A->0x01 
b->0x02
B->0x03

等等

请记住,比较器将多次访问元素,因此如果数据量足够大(例如,>106 个字符串)并且性能是个问题,您可能必须缓存排序键。

假设你只有字母,你可以定义一个类似这样的比较器:

Comparator<String> comparator = (a, b) -> {
    // Compare the characters pairwise.
    for (int i = 0, m = Math.min(a.length(), b.length()); i < m; ++i) {
      char aa = a.charAt(i);
      char bb = b.charAt(i);
      // If one is lowercase but the other isn't, say that the lowercase comes first.
      if (Character.isLowerCase(aa) != Character.isLowerCase(bb)) {
        return Character.isLowerCase(aa) ? -1 : 1;
      }

      // If the characters are the same case but aren't the same, say the lexicographically first one is first.
      if (aa != bb) {
        return aa < bb ? -1 : 1;
      }
    }
    // If the pair-wise comparison doesn't find a difference, say the shortest one is first; or they are equal if the same length.
    return Integer.compare(a.length(), b.length());
};

Java 有 class RuleBasedCollator 允许自定义 sorting/ordering 个字符。

在这种情况下,小写字母应位于大写字母之前,因此规则可能如下所示:

static RuleBasedCollator lowerFirst() {
    try {
        return new RuleBasedCollator(
            "< a < A < b < B < c < C < d < D < e < E < f < F < g < G < h < H < i < I < j < J < "
            + "k < K < l < L < m < M < n < N < o < O < p < P < q < Q < r < R < s < S < t < T < "
            + "u < U < w < W < x < X < y < Y < z < Z"
        );
    } catch (ParseException parsex) {
        throw new IllegalArgumentException("Failed to create lowerFirst collator", parsex);
    }
}

测试:

String[] names = new String[] {
    "DCME",  "CCME", "ACME", "BCME",  "AGME",
    "AACME", "aCME", "Acme", "AaCME", "aACME",
};

String[] expected = new String[] {
    "aACME", "aCME", "AaCME", "AACME", "Acme",
    "ACME", "AGME", "BCME", "CCME", "DCME"
};
        
Arrays.sort(names, lowerFirst());

System.out.println("sorted:   " + Arrays.toString(names));
System.out.println("expected: " + Arrays.toString(expected));

输出

sorted:   [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]
expected: [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]