Python 中具有相同内容的 2 个文件的 2 行标识

Question

我正在同时读取两个内容相同的文本文件（单词列表）中的行。

peach
carrot
apple
lemon

我想检查两条同步线是否相同。如果不是，则总相似性会降低。由于这两个文件是相同的，因此检查身份应该会导致 100% 的相似性。相反，我得到 0%。

from itertools import izip, izip_longest

with open(r'file1.txt', "rb") as f1, open(r'file2.txt', "rb") as f2:

    #initialize numerator & denominator values for calculating file similarity
    nTotal = 4 #total number of lines in each file
    nIdent = nTotal

    for line1, line2 in izip_longest(f1, f2):

        if((line1 is line2) is False):

            nIdent -=1

    similarity = nIdent/nTotal

为什么线条不一样？

Answer 1

你必须改变：

if((line1 is line2) is False):

作者：

if line1 == line2:

当你比较 Python 中的 string 对象时，你不能使用 is 运算符，因为在大多数解释器实现中，相同的字符串被表示为不同的对象 大部分时间.

is operator return True 如果你比较的对象是相同的，而不是如果对象的值是相同的，这是你需要的最后一种情况。

在某些解释器实现中，具有相同值的字符串文字可以结束实现共享相同的对象，但这不是您应该信任脚本的事情：

'abc' is 'abc' # True in CPython.

以上示例完全取决于实现，将来可能会有所不同。你应该通过它的值来比较不可变对象，而不是通过它的对象 ID（这就是 is 运算符所做的）。

Answer 2

您的比较 line1 is line2 与 line1 == line2 不相同。这些对象不相同，但它们代表的数据是。

equal_lines = 0

with open(r'file1.txt', "rb") as f, open(r'file2.txt', "rb") as f2:
    for f1_line, f2_line in zip(f.readlines(), f2.readlines()):
        if f1_line == f2_line:
            equal_lines += 1

Python 中具有相同内容的 2 个文件的 2 行标识

identity of 2 lines from 2 files with same content in Python

python

iteration

identity

lines