Python difflib 的比率，quick_ratio 和 real_quick_ratio

Question

我一直在使用 difflib 的 SequenceMatcher,

而且我发现 ratio 函数太慢了。通读 documentation，我发现 quick_ratio 和 real_quick_ratio 应该更快（顾名思义）并用作上限。

但是，documentation 缺少对他们所做的假设或他们提供的加速的描述。

我什么时候应该使用任一版本，我需要牺牲什么？

Answer 1

正在看

从辅助方法开始 _calculate_ratio

def _calculate_ratio(matches, length):
    if length:
        return 2.0 * matches / length
    return 1.0

比率

ratio 找到匹配项，并将其除以两个字符串的总长度乘以 2：

    return _calculate_ratio(matches, len(self.a) + len(self.b))

quick_ratio

其实源评论是这么说的：

    # viewing a and b as multisets, set matches to the cardinality
    # of their intersection; this counts the number of matches
    # without regard to order, so is clearly an upper bound

然后

    return _calculate_ratio(matches, len(self.a) + len(self.b))

真实_quick_ratio

real_quick_ratio找到最短的字符串，除以字符串的总长度乘以2：

    la, lb = len(self.a), len(self.b)
    # can't have more matches than the number of elements in the
    # shorter sequence
    return _calculate_ratio(min(la, lb), la + lb)

这是真正的上限。

结论

real_quick_ratio 不会查看字符串以查看是否存在任何匹配项，它只会根据字符串长度计算上限。

现在，我不是算法专家，但如果您认为 ratio 完成工作太慢，我建议使用 quick_ratio，因为它可以充分解决问题。

关于效率的说明

来自文档字符串

    .ratio() is expensive to compute if you haven't already computed
    .get_matching_blocks() or .get_opcodes(), in which case you may
    want to try .quick_ratio() or .real_quick_ratio() first to get an
    upper bound.

Python difflib 的比率，quick_ratio 和 real_quick_ratio

Python difflib's ratio, quick_ratio and real_quick_ratio

python

diff

正在看

比率

quick_ratio

真实_quick_ratio

结论

关于效率的说明