为什么 KMP 的正常情况复杂度为 O(N)？

Question

在终于理解了 KMP 算法之后，它的复杂性仍然不清楚。对于模式 M 的长度和文本 N 的长度，我知道，在最坏的情况下它是 O(N+M) （仍然无法解释，为什么），但为什么它是 O(N) 在平均案件？预先感谢您的帮助。

Answer 1

在这个回答中，我将解决你问题的这一部分。

For the length of the pattern M and for the length of the text N, I know, in worst case it's O(N+M) (still can't explain, why)

我将尝试解释为什么计算失败函数（KMP算法中的预处理步骤）需要线性时间，您可以对字符串匹配部分的运行时间进行类似的分析算法。

代码（取自here）：

KNUTH-MORRIS-PRATT FAILURE (P)

Input : Pattern with N characters
Output: Failure function f for pattern P

i ← 1
matched ← 0
f(0) ← 0
while i < N do
 if P[matched] = P[i]
     f(i) ← j +1
     i ← i +1
     matched← matched + 1
 else if matched>0
     matched ← f(matched - 1)
 else
     f(i) ← 0
     i ← i +1

运行计算失败函数的时间=while循环的最大迭代次数

Number of iterations of while loop = Number of successful matches + Number of unsuccessful matches

Let's call this statement (1)

令 N = 模式的长度。

Number of successful matches<=N
(Check for instance, this pattern "aaaa")

Let's call this statement (2)

不成功的匹配次数有点棘手。

让我们举个例子模式"aaabcaaabce"

对于此模式，数组为：

| 0 | 1 | 2 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 0 |

注意在索引 3 中，当不匹配发生时，当我们下降到值 0 时，我们最多可以做 2 个不匹配，或者换句话说，我们只能做与匹配的字符数一样多的不匹配所以远.

因此每次索引 i 发生不匹配时，该索引 i 的最大不匹配数最多可以是目前匹配的字符数（索引 i-1）。

直观地说，整个数组中匹配的字符数就是数组中非零的条目。因为非零条目意味着匹配。因为，

最大匹配失败数 = 匹配字符数

=> 整个数组中不成功的最大匹配数

= 数组中最大字符匹配数

= 数组中非零的条目数

<=N

Thus, Maximum number of unsuccessful matches over the array <=N

Let's call this statement (3)

将 (1)、(2) 和 (3) 放在一起：

总运行时间 = 运行成功匹配的时间 + 运行不成功匹配的时间

运行时间<=N+N

运行时间 <= 2*N

Running time for computing failure function = O(N)

为什么 KMP 的正常情况复杂度为 O(N)？

Why is the normal case complexity of KMP O(N)?

case

time-complexity

knuth-morris-pratt