对字符串进行二分查找的复杂度

Question

我有一个排序的字符串数组：例如：["bar"、"foo"、"top"、"zebra"] 我想搜索输入的单词是否是是否存在于数组中。

例如：

search (String[] str, String word) {
     // binary search implemented + string comaparison.
}

现在二分查找将解决复杂度为 O(logn) 的问题，其中 n 是数组的长度。太好了。

但是，在某些时候我们需要进行字符串比较，这可以在线性时间内完成。

Now the input array can contain of words of different sizes. So when I am calculating final complexity will the final answer be O(m*logn) where m is the size of word we want to search in the array, which in our case is "zebra" the word we want to search?

Answer 1

是的，你的想法和你提出的解决方案都是正确的。在字符串搜索的整体复杂度中，您还需要考虑最长字符串的长度。

简单的字符串比较是一个 O(m) 操作，其中 m 是两个字符串中较大字符串的长度。

但是，鉴于数组已排序，我们可以改进很多。正如用户 "doynax" 建议的那样，

Complexity can be improved by keeping track of how many characters got matched during the string comparisons, and store the present count for the lower and upper bounds during the search. Since the array is sorted we know that the prefix of the middle entry to be tested next must match up to at least the minimum of the two depths, and therefore we can skip comparing that prefix. In effect we're always either making progress or stopping the incremental comparisons immediately on a mismatch, and thereby never needing to keep going over old ground.

因此，总的来说 m 字符比较必须进行到字符串末尾，如果找到的话，否则甚至不会那么多（如果在早期失败）。

因此，整体复杂度为 O(m + log n)。

Answer 2

我以为楼主说的时间复杂度是O(m*logn)是对的。

如果您使用建议的增强功能通过跟踪以前匹配的字母来提高时间复杂度（得到 O(m + logn)），我相信下面的输入会破坏它。

arr = [“abc”、“def”、“ghi”、“nlj”、“pfypfy”、“xyz”]

目标=“nljpfy”

我预计这会错误地匹配“pfypfy”。也许其中一位原始发帖人可以对此发表意见。非常想更好地理解提议的内容。听起来在下一次比较中跳过了匹配数量的字母。

对字符串进行二分查找的复杂度

Complexity of binary search on a string

algorithm

complexity-theory

big-o

binary-search

time-complexity