每个字符出现偶数次（可能为零）的最长子串

Question

Suppose we have a string s. We want to find the length of the longest substring of s such that every character in the substring appears even number of times (possible zero).
WC Time: O(nlgn). WC space: O(n)

首先，很明显子串必须是偶数长度。其次，我熟悉滑动 window 方法，在该方法中我们锚定一些 right 索引并寻找最左边的索引来匹配您的标准。我试图在这里应用这个想法，但无法真正表达它。

此外，在我看来优先级队列可以派上用场（因为 O(nlgn) 要求有点暗示）

我很乐意提供帮助！

Answer 1

让我们定义以下位集：

B[c,i] = 1 if character c appeared in s[0,...,i] even number of times.

计算 B[c,i] 需要线性时间（对于所有值）：

for all c:
  B[c,-1] = 0
for i from 0 to len(arr):
  B[c, i] = B[s[i], i-1] XOR 1

由于字母表的大小不变，位集也是如此（对于每个 i）。

注意条件：

every character in the substring appears even number of times

对于子字符串 s[i,j] 为真当且仅当索引 i 的位集与索引 j 的位集相同（否则，有一个重复奇数的位该子串中的次数 ; 反方向：如果有一个位重复次数，那么它的位不能相同）。

因此，如果我们将所有位集存储在某个集合中（哈希 set/tree 集合），并且仅保留最新条目，则此预处理需要 O(n) 或 O(nlogn) 时间（取决于 hash/tree设置).

在第二次迭代中，对于每个索引，找到更远的具有相同位集的索引（O(1)/O(logn)，取决于是否设置了hash/tree），找到子串长度，并将其标记为候选。最后取最长的候选

此解决方案是 O(n) space 位集，O(n)/O(nlogn) 时间，取决于是否使用 hash/tree 解决方案。

伪代码：

def NextBitset(B, c): # O(1) time
  for each x in alphabet \ {c}:
    B[x, i] = B[x, i-1]
   B[c, i] = B[c, i-1] XOR 1

for each c in alphabet: # O(1) time
  B[c,-1] = 0
map = new hash/tree map (bitset->int)

# first pass: # O(n)/O(nlogn) time
for i from 0 to len(s):
   # Note, we override with the latest element.
   B = NextBitset(B, s[i])
   map[B] = i

for each c in alphabet: # O(1) time
  B[c,-1] = 0
max_distance = 0
# second pass: O(n)/ O(nlogn) time.
for i from 0 to len(s):
  B = NextBitset(B, s[i])
  j = map.find(B)  # O(1) / O(logn)
  max_distance = max(max_distance, j-i)

Answer 2

我不确定 amit 到底提出了什么建议，所以如果是这样，请考虑另一种解释。这可以在一次遍历中完成。

为字符串的每个索引生成长度等于字母的位集。存储遍历字符串时遇到的每个唯一位集的第一个索引。更新当前和之前看到的位集之间的最大间隔。

例如字符串，"aabccab":

  a a b c c a b
  0 1 2 3 4 5 6 (index)

                _
0 1 0 0 0 0 1 1  |  (vertical)
0 0 0 1 1 1 1 0  |  bitset for
0 0 0 0 1 0 0 0 _|  each index
  ^           ^
  |___________|

   largest interval
   between current and
   previously seen bitset

每次迭代的更新可以在 O(1) 中完成，方法是将每个字符的位掩码预处理为 XOR 和前一个位集：

bitset     mask
  0         1     1
  1   XOR   0  =  1
  0         0     0

表示更新与 alphabet-bitset 中的第一位关联的字符。

每个字符出现偶数次（可能为零）的最长子串

Longest substring where every character appear even number of times (possibly zero)

algorithm

computer-science