是否有一个 trick/algorithm 我们可以在 O(n) 时间内找到所有可能的子串

Question

我有一个蛮力解决方案可以在 O(n^2) 时间内计算输入字符串中的所有子字符串。当我输入的字符串很长时需要很长时间。

我们如何在 O(n) 时间内找到所有可能的子串？

我只查找子字符串中第一个和最后一个字符相同的所有子字符串的计数。如您所见，我在下面的代码中仅从函数返回计数。我想在 O(n) 时间内完成

我的暴力解决方案：

// I am calculating count of all substrings where first and last substring character are equal

public class Solution {

public static void main(String[] args) {

    String inputString = "ababaca";

    System.out.println(findSubstringByBruteForcce(inputString, inputString.length()));

}

private static long findSubstringByBruteForcce(String inputString, int length) {
    long count = 0;     
    for (int i = 0; i < length; i++) {
        for (int j = 1; j <= length - i; j++) {
            String str = inputString.substring(i, i + j); 
            if(str.length() == 1){
                count = count + 1;
            }else {
                if(str.substring(0, 1).equals(str.substring(str.length() - 1, str.length()))){
                    count = count + 1;
                }
            }
        }
    }
    return count;
}

}

如何优化上述解决方案并在 O(N) 时间内找到答案？输入的字符串可能非常大（大约 10^6 长），暴力破解大约需要 20 秒。我希望最长运行时间不超过 2 秒。

Answer 1

由于子字符串的身份是由边界索引而不是内容决定的，因此计算每个字母的频率就足够了，然后对每个字母求和 (frequency + 1) * frequency div 2，因为每对字母位置重复但不考虑顺序产生一个计数子串。

Answer 2

这是快的 O(n)，但内存太多：

public static long findSubstringByCharacterMap(String s, int length) {
    long count = 0;
    long[] map = new long[Character.MAX_VALUE + 1];
    for (int i = 0; i < length; ++i)
        count += ++map[s.charAt(i)];
    return count;
}

如果字符串只包含单字节字符，long[] map的大小可以是256。

您可以将 long[] map 重写为 Map<Character, Long> map。但是很慢。

Answer 3

我有一个解决方案，它采用大小为 256（最大 Ascii 值为 255）和 o(n) 时间复杂度的常数额外 space。

算法步骤

创建一个 256 的数组。
添加ans中当前元素的当前频率&更新string中当前元素的频率。
遍历整个字符串。
在 ans 中添加字符串长度。

这是我的 Java 代码实现如果我错了或者我没有理解问题，请告诉我。

import java.util.*;
import java.lang.*;
import java.io.*;


class Solution
{
 public static void main (String[] args) throws java.lang.Exception
 {
  String str="aabbab#cd#e";
  int[] array=new int[256];
  int ans=0;
  for(int i=0;i<str.length();i++){
      ans+=array[(int)str.charAt(i)];
      array[(int)str.charAt(i)]++;
  }
  ans=ans+str.length();
  System.out.print(ans);
  
 }
}

在此算法中，重复的字符串将被计算在内。

是否有一个 trick/algorithm 我们可以在 O(n) 时间内找到所有可能的子串

Is there a trick/algorithm by which we can find all substrings possible in O(n) time

java

string

algorithm

substring