如何更正用于比较包含按键的两个字符串的算法？

Question

这是 return 如果两个字符串相等则为真的算法。该字符串可能包含退格键等按键。该代码使用光标和指针遍历字符串中的每个字母，如果找到按键则跳过 2 个位置（即 \b）

#!/usr/bin/env python
import argparse
import __builtin__

# Given two different strings, one with backspaces (keypresses), find if they are equivalent or not

def main():
    parser = argparse.ArgumentParser(description="Enter two strings without or without backspaces")
    parser.add_argument("s1", type=str, help="The first string.")
    parser.add_argument("s2", type=str, help="The second string.")
    args = parser.parse_args()
    print(compare(args.s1, args.s2))

def compare(s1, s2):
    BACKSPACE = '\b'
    cursor = 0;
    pointer1 = 0; pointer2 = 0; # current position in backspaced string. 

    canon_len1 = len(s1); canon_len2 = len(s2); # length of the canonical string

    num_diff = 0
    while True:
        if s1[pointer1] == BACKSPACE or s2[pointer2] == BACKSPACE:
            # decrement the cursor and undo the previous compare
            cursor -= 1; 
            if s1[cursor] != s2[cursor]:
                num_diff -= 1
            # decrement the canonical lengths appropriately
            canon_len1 -= 2 if s1[pointer1] == BACKSPACE else 0
            canon_len2 -= 2 if s2[pointer2] == BACKSPACE else 0
        else:

            if s1[pointer1] != s2[pointer2]:
                num_diff += 1
            cursor += 1

        # increment the pointers, making sure we don't run off then end 
        pointer1 += 1; pointer2 += 1;
        if pointer1 == len(s1) and pointer2 == len(s2):
            break
        if pointer1 == len(s1): pointer1 -= 1
        if pointer2 == len(s2): pointer2 -= 1

    return num_diff == 0 and canon_len1 == canon_len2

if __name__ == "__main__":
    main()

#!/usr/bin/env python

import compare_strings
import unittest

class compare_strings_test(unittest.TestCase):

    def test_01(self):
        raised = False
        try:
            compare_strings.compare('Toronto', 'Cleveland')
        except:
            raised = True
        self.assertFalse(raised, 'Exception raised')

    def test_02(self):
        equivalent = compare_strings.compare('Toronto', 'Cleveland')
        self.assertEquals(equivalent, False)

    def test_03(self):
        equivalent = compare_strings.compare('Toronto', 'Toroo\b\bnto')
        self.assertEquals(equivalent, False)

    def test_04(self):
        equivalent = compare_strings.compare('Toronto', 'Torooo\b\bntt\bo')
        self.assertEquals(equivalent, True)

if __name__ == "__main__":
    unittest.main()

...F
======================================================================
FAIL: test_04 (__main__.compare_strings_test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "compare_strings_test.py", line 26, in test_04
    self.assertEquals(equivalent, True)
AssertionError: False != True

----------------------------------------------------------------------
Ran 4 tests in 0.001s

测试 4 失败，但 'Toronto' 和 'Torooo\b\bntt\bo' 应该是等价的减去退格键

Answer 1

最好事先使用以下函数从字符串中删除退格：

def normalize(s):
    result = []
    for c in s:
        if c == '\b':
            result.pop()  // A try-catch block could be added here
        else:
            result.append(c)

    return "".join(result)

然后比较。

Answer 2

我认为您当前代码中的问题源于这样一个事实，即您可以连续多次向后 space，但您只向后看 "one" 个字符。（我在这一点上可能是错的，我还没有用 pdb 单步执行代码。）

正如评论中所建议的，解决这个问题的一个不错的方法是将它分成以下两部分。

Canonicalize/Normalize 两个输入字符串。这意味着一次处理一个，从每个字符串中去除后面的 space 和相关的前一个字符。
比较两个规范化的字符串。

第 2 步很简单，只需使用内置的字符串比较方法（python 中的==）。

第 1 步有点难，因为您可能在输入字符串的一行中有多个 backspace。处理这个问题的一种方法是每次一个字符构建一个新字符串，然后在每个后面space，删除最后添加的字符。这是一些示例代码。

def canonicalize(s):
    normalized_s = ""
    for i, c in enumerate(s):
        # Check for a backspace, taking care not to run off the end of the string.
        if c == BACKSPACE:
            normalized_s = normalized_s[:-1]
        else:
            normalized_s += c

    return normalized_s

这种方法的一个很好的副作用是前导space不会导致任何错误，它们会被忽略。稍后我将尝试在其他实现中保留此属性。使用像 c++ 这样可以修改字符串的语言编写的代码可以相当容易地提高效率，因为它类似于将指针和条目更改为 char 数组。

在 python 中，每次编辑都会创建一个新字符串（或者至少不能保证不会分配新字符串）。我认为注意你自己的堆栈（也就是一个由字符组成的数组，指针指向末尾）可以产生更好的代码。 python 中有多种管理堆栈的方法，其中最常见的是列表，另一个不错的选择是 collections.deque。除非剖析器另有说明，否则我会使用更熟悉的列表。

def canonicalize(s):
    normalized_s = list()
    for c in s:
        # Check for a backspace, taking care not to run off the end of the string.
        if c == BACKSPACE:
            if normalized_s:
                normalized_s.pop()
        else:
            normalized_s.append(c)

    return "".join(normalized_s)

最终的比较方法可能类似于

def compare(s1, s2):
    return canonicalize(s1) == canonlicalize(s2)

上面的代码有两个问题。首先是几乎可以保证创建两个新字符串。第二个是它总共需要遍历四次字符串，每个输入字符串一次，每个清理后的字符串一次。

这可以通过向后而不是向前进行改进。通过向后迭代，你可以看到后面的spaces，并提前知道哪些字符将被删除（读取忽略或跳过）。我们继续前进，直到出现不匹配，或者至少有一个字符串用完了字符。这种方法需要更多的簿记，但不需要额外的 space。它仅使用两个指针来跟踪每个字符串的当前进度，并使用一个计数器来跟踪要忽略的字符数。下面显示的代码并不是特别 pythonic，它可以做得更好。如果您要使用（两个）生成器和一个 izip_longest.

，则可以去除所有样板文件

def compare(s1, s2):
    i, j = len(s1) - 1, len(s2) - 1

    while i >= 0 or j >= 0:
        ignore = 0
        while i >= 0:
            if s1[i] == BACKSPACE:
                ignore += 1
            elif ignore > 0:
                ignore -= 1
            else:
                break
            i -= 1

        ignore = 0
        while j >= 0:
            if s2[j] == BACKSPACE:
                ignore += 1
            elif ignore > 0:
                ignore -= 1
            else:
                break
            j -= 1

        if i < 0 and j < 0:
            # No more characters to try and match
            return True

        if (i < 0 and j >= 0) or (i >= 0 and j < 0):
            # One string exhausted before the other
            return False

        if s1[i] != s2[j]:
            return False

        i -= 1
        j -= 1

    return True

编辑

以下是我为比较的最后一个实现尝试的一些测试用例。

true_testcases = (
    ("abc", "abc"),
    ("abc", "abcde\b\b"),
    ("abcdef", "\b\babcdef\bf"),
    ("", "\b\b\b"),
    ("Toronto", "Torooo\b\bntt\bo"))

false_testcases = (
    ("a", "a\b"),
    ("a", "a\b\b"),
    ("abc", "abc\bd\be"),
)

print([eq(s1, s2) for s1, s2 in true_testcases])
print([eq(s1, s2) for s1, s2 in false_testcases])

如何更正用于比较包含按键的两个字符串的算法？

How do I correct the algorithm for comparing two strings containing keypresses?

python

string

algorithm

python-unittest