在二进制搜索中，如果找不到该元素，为什么约定从它应该做的地方减去一个？

Question

我知道这已经深入到本质上了，但是当进行二分查找并且找不到元素时，return (-(insertion point) -1) 的合理性是什么？特别是 -1 部分。 Java 就是这样做的，我不明白为什么他们制定了约定 -1 而不是 -(insertion point)。显然，否定是表示在 array/list 中实际上没有找到该值。我猜它来自 C，在 C 中更容易进行一些取反和减一的按位运算。

注意：我见过使用此约定的 C、C++ 和 Java 编写的代码，我想知道该约定从何而来？

Answer 1

这是因为插入点可能为零，而 int 没有 -0（与浮点数不同），因此您需要一些其他方式来明确表示它。

正如 Javadoc 中所说：

Note that this guarantees that the return value will be >= 0 if and only if the key is found.

当然，还有其他方式表示插入位置；这恰好非常优雅，因为它不需要额外的信息，例如容器的大小，以便用于适当地插入元素。

就公约的起源地而言 - 我敢打赌，时间的迷雾！

作为纯粹的猜想，我可以想象它会出现在 C（或者甚至更早的语言）中，它是一种比其他方法更简洁的编码值的方法。

一个"obvious"替代编码可能是用符号位表示presence/absence，其余位表示插入位置：

S PPPP....P
^            0 means "present", 1 means "absent"
  ^---....^  These bits denote the position in the container.

在设置了符号位的情况下提取位置，需要屏蔽这些位。这在 Java 中很容易，其中 int 被定义为具有 32 位（只需使用 value & 0x7FFFFFF）；但是要用可移植的 C 语言编写它，您需要执行以下操作：

value = binarySearch(...);
if (value < 0) {
  insertionPosition = value & ~(1 << sizeof(int) * 8 - 1);
  ...
}

（如果说的不太对，请原谅 - 这就是为什么 Java 程序员不应该尝试编写 C...）

即使是固定宽度，也有点神秘：

value = binarySearch(...);
if (value < 0) {
  insertionPosition = value & 0x7FFFFFFF;  // What's this magic number?!
  ...
}

如果你不得不在很多地方写的话，这很丑陋，而且很容易出错。当然，您可以编写一个小方法来为您做这个数学运算，但方法调用很昂贵（至少，它们可能在过去已经回归）。

使用 (-(insertion point) -1) 约定，您可以编写简单、易读、快速的代码：

value = binarySearch(...);
if (value < 0) {
  insertionPosition = -value - 1;
  ...
}

在二进制搜索中，如果找不到该元素，为什么约定从它应该做的地方减去一个？

In binarysearch, if the element isn't found, why is the convention to subtract one from where it should do?

c

java

language-agnostic

algorithm

binary-search