测试因意外 Collections.sort 行为而失败
Test failing for unexpected Collections.sort behavior
请注意: 我在这里提到了 JUnit 并提供了一个使用它的 SSCCE 代码示例,但这本质上是一个 Java 集合问题,可以由任何有 Java 经验的人回答,无论他们是否使用过 JUnit。
Java 8 在这里,我正在尝试对字符串列表进行排序,但是 Collections.sort(myList)
出现了一些意外行为,我想知道发生了什么。
这是我的完整单元测试:
@RunWith(MockitoJUnitRunner.class)
public class SorterTest {
@Test
public void should_sort_correctly_including_capitalization_rules() {
// given
String[] actualNames = new String[] {
"DCME",
"CCME",
"ACME",
"BCME",
"AGME",
"AACME",
"aCME",
"Acme",
"AaCME",
"aACME",
};
List<String> actual = Arrays.asList(actualNames);
// the order I would *expect* them to sort into...
String[] expectedNames = new String[] {
"aACME",
"aCME",
"AaCME",
"AACME",
"Acme",
"ACME",
"AGME",
"BCME",
"CCME",
"DCME"
};
List<String> expected = Arrays.asList(expectedNames);
// when
Collections.sort(actual);
// then
assertTrue(actual.equals(expected));
}
}
此处的 JUnit assertTrue
在运行时失败,因为 actual
列表被分类为:
0 = "AACME"
1 = "ACME"
2 = "AGME"
3 = "AaCME"
4 = "Acme"
5 = "BCME"
6 = "CCME"
7 = "DCME"
8 = "aACME"
9 = "aCME"
这就是 ^^^ 调试器输出,数字代表每个元素的列表索引。
所以出于某种原因 Collections.sort
是说字符串“BCME”在字典上比“aCME”“低”(在排序列表中出现得更早),这对我来说简直是疯了。 :-)
我应该提一下,我在这里只处理 UTF-8 中的 ASCII 字符,但我的应用程序将执行预验证,以确保每个 string/name 中的所有字符都在 [a-z][A-Z]
.
无论哪种方式,我正在寻找要使用的 Java 代码的排序规则是:
- 当我说“更低”时,我的意思是“将在排序列表中出现得更早”,而当我说“更高”时我的意思是“稍后会出现在排序列表中”
- 因此我会说“3 小于 43”,因为在排序的整数列表中,3 将比 43 更早出现在该列表中,等等。
- 小写字母比大写字母小;所以“a”应该出现在“A”之前
- 因此所有字母的顺序是
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
- 较短的单词出现在较长的单词之前,前提是它们是较长单词的相同(包括大小写)子集
- “but”低于(先于)“butterfly”
- “蝴蝶”低于“但是”(b < B)
- "butterfly" 比 "bUt" 低(b 和 b 相同,但 u < U)
根据这些排序规则,我的单元测试列表应排序为:
Sort Order Reason why it comes after the last one in the list
================================================================
aACME
aCME 1st letter is 'a' but 2nd letter is 'C' and A < C
AaCME 1st letter is 'A' and a < A
AACME 1st letter is 'A' and 2nd letter is 'A' and a < A
Acme 1st letter is 'A' but 2nd letter is 'c' and A < c
ACME 1st letter is 'A' but 2nd letter is 'C' and c < C
AGME 1st letter is 'A' but 2nd letter is 'G' and C < G
BCME 1st letter is 'B' and aA < bB
CCME 1st letter is 'C' and bB < cC
DCME 1st letter is 'D' and cC < dD
如何更改上面的代码,以便单元测试通过并且列表按我需要的方式排序?
写一个比较器来按照你想要的方式对事物进行排序。我们不会为您编写它,但是将比较器 map/translate 字符串放入相应的排序键中应该很简单...
例如,假设只有[A-Za-z]
map
a->0x00
A->0x01
b->0x02
B->0x03
等等
请记住,比较器将多次访问元素,因此如果数据量足够大(例如,>106 个字符串)并且性能是个问题,您可能必须缓存排序键。
假设你只有字母,你可以定义一个类似这样的比较器:
Comparator<String> comparator = (a, b) -> {
// Compare the characters pairwise.
for (int i = 0, m = Math.min(a.length(), b.length()); i < m; ++i) {
char aa = a.charAt(i);
char bb = b.charAt(i);
// If one is lowercase but the other isn't, say that the lowercase comes first.
if (Character.isLowerCase(aa) != Character.isLowerCase(bb)) {
return Character.isLowerCase(aa) ? -1 : 1;
}
// If the characters are the same case but aren't the same, say the lexicographically first one is first.
if (aa != bb) {
return aa < bb ? -1 : 1;
}
}
// If the pair-wise comparison doesn't find a difference, say the shortest one is first; or they are equal if the same length.
return Integer.compare(a.length(), b.length());
};
Java 有 class RuleBasedCollator
允许自定义 sorting/ordering 个字符。
在这种情况下,小写字母应位于大写字母之前,因此规则可能如下所示:
static RuleBasedCollator lowerFirst() {
try {
return new RuleBasedCollator(
"< a < A < b < B < c < C < d < D < e < E < f < F < g < G < h < H < i < I < j < J < "
+ "k < K < l < L < m < M < n < N < o < O < p < P < q < Q < r < R < s < S < t < T < "
+ "u < U < w < W < x < X < y < Y < z < Z"
);
} catch (ParseException parsex) {
throw new IllegalArgumentException("Failed to create lowerFirst collator", parsex);
}
}
测试:
String[] names = new String[] {
"DCME", "CCME", "ACME", "BCME", "AGME",
"AACME", "aCME", "Acme", "AaCME", "aACME",
};
String[] expected = new String[] {
"aACME", "aCME", "AaCME", "AACME", "Acme",
"ACME", "AGME", "BCME", "CCME", "DCME"
};
Arrays.sort(names, lowerFirst());
System.out.println("sorted: " + Arrays.toString(names));
System.out.println("expected: " + Arrays.toString(expected));
输出
sorted: [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]
expected: [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]
请注意: 我在这里提到了 JUnit 并提供了一个使用它的 SSCCE 代码示例,但这本质上是一个 Java 集合问题,可以由任何有 Java 经验的人回答,无论他们是否使用过 JUnit。
Java 8 在这里,我正在尝试对字符串列表进行排序,但是 Collections.sort(myList)
出现了一些意外行为,我想知道发生了什么。
这是我的完整单元测试:
@RunWith(MockitoJUnitRunner.class)
public class SorterTest {
@Test
public void should_sort_correctly_including_capitalization_rules() {
// given
String[] actualNames = new String[] {
"DCME",
"CCME",
"ACME",
"BCME",
"AGME",
"AACME",
"aCME",
"Acme",
"AaCME",
"aACME",
};
List<String> actual = Arrays.asList(actualNames);
// the order I would *expect* them to sort into...
String[] expectedNames = new String[] {
"aACME",
"aCME",
"AaCME",
"AACME",
"Acme",
"ACME",
"AGME",
"BCME",
"CCME",
"DCME"
};
List<String> expected = Arrays.asList(expectedNames);
// when
Collections.sort(actual);
// then
assertTrue(actual.equals(expected));
}
}
此处的 JUnit assertTrue
在运行时失败,因为 actual
列表被分类为:
0 = "AACME"
1 = "ACME"
2 = "AGME"
3 = "AaCME"
4 = "Acme"
5 = "BCME"
6 = "CCME"
7 = "DCME"
8 = "aACME"
9 = "aCME"
这就是 ^^^ 调试器输出,数字代表每个元素的列表索引。
所以出于某种原因 Collections.sort
是说字符串“BCME”在字典上比“aCME”“低”(在排序列表中出现得更早),这对我来说简直是疯了。 :-)
我应该提一下,我在这里只处理 UTF-8 中的 ASCII 字符,但我的应用程序将执行预验证,以确保每个 string/name 中的所有字符都在 [a-z][A-Z]
.
无论哪种方式,我正在寻找要使用的 Java 代码的排序规则是:
- 当我说“更低”时,我的意思是“将在排序列表中出现得更早”,而当我说“更高”时我的意思是“稍后会出现在排序列表中”
- 因此我会说“3 小于 43”,因为在排序的整数列表中,3 将比 43 更早出现在该列表中,等等。
- 小写字母比大写字母小;所以“a”应该出现在“A”之前
- 因此所有字母的顺序是
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
- 因此所有字母的顺序是
- 较短的单词出现在较长的单词之前,前提是它们是较长单词的相同(包括大小写)子集
- “but”低于(先于)“butterfly”
- “蝴蝶”低于“但是”(b < B)
- "butterfly" 比 "bUt" 低(b 和 b 相同,但 u < U)
根据这些排序规则,我的单元测试列表应排序为:
Sort Order Reason why it comes after the last one in the list
================================================================
aACME
aCME 1st letter is 'a' but 2nd letter is 'C' and A < C
AaCME 1st letter is 'A' and a < A
AACME 1st letter is 'A' and 2nd letter is 'A' and a < A
Acme 1st letter is 'A' but 2nd letter is 'c' and A < c
ACME 1st letter is 'A' but 2nd letter is 'C' and c < C
AGME 1st letter is 'A' but 2nd letter is 'G' and C < G
BCME 1st letter is 'B' and aA < bB
CCME 1st letter is 'C' and bB < cC
DCME 1st letter is 'D' and cC < dD
如何更改上面的代码,以便单元测试通过并且列表按我需要的方式排序?
写一个比较器来按照你想要的方式对事物进行排序。我们不会为您编写它,但是将比较器 map/translate 字符串放入相应的排序键中应该很简单...
例如,假设只有[A-Za-z]
map
a->0x00
A->0x01
b->0x02
B->0x03
等等
请记住,比较器将多次访问元素,因此如果数据量足够大(例如,>106 个字符串)并且性能是个问题,您可能必须缓存排序键。
假设你只有字母,你可以定义一个类似这样的比较器:
Comparator<String> comparator = (a, b) -> {
// Compare the characters pairwise.
for (int i = 0, m = Math.min(a.length(), b.length()); i < m; ++i) {
char aa = a.charAt(i);
char bb = b.charAt(i);
// If one is lowercase but the other isn't, say that the lowercase comes first.
if (Character.isLowerCase(aa) != Character.isLowerCase(bb)) {
return Character.isLowerCase(aa) ? -1 : 1;
}
// If the characters are the same case but aren't the same, say the lexicographically first one is first.
if (aa != bb) {
return aa < bb ? -1 : 1;
}
}
// If the pair-wise comparison doesn't find a difference, say the shortest one is first; or they are equal if the same length.
return Integer.compare(a.length(), b.length());
};
Java 有 class RuleBasedCollator
允许自定义 sorting/ordering 个字符。
在这种情况下,小写字母应位于大写字母之前,因此规则可能如下所示:
static RuleBasedCollator lowerFirst() {
try {
return new RuleBasedCollator(
"< a < A < b < B < c < C < d < D < e < E < f < F < g < G < h < H < i < I < j < J < "
+ "k < K < l < L < m < M < n < N < o < O < p < P < q < Q < r < R < s < S < t < T < "
+ "u < U < w < W < x < X < y < Y < z < Z"
);
} catch (ParseException parsex) {
throw new IllegalArgumentException("Failed to create lowerFirst collator", parsex);
}
}
测试:
String[] names = new String[] {
"DCME", "CCME", "ACME", "BCME", "AGME",
"AACME", "aCME", "Acme", "AaCME", "aACME",
};
String[] expected = new String[] {
"aACME", "aCME", "AaCME", "AACME", "Acme",
"ACME", "AGME", "BCME", "CCME", "DCME"
};
Arrays.sort(names, lowerFirst());
System.out.println("sorted: " + Arrays.toString(names));
System.out.println("expected: " + Arrays.toString(expected));
输出
sorted: [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]
expected: [aACME, aCME, AaCME, AACME, Acme, ACME, AGME, BCME, CCME, DCME]