使用 Python 逐行比较两个文本文件
Using Python to Compare Two Text Files Line by Line
我有两个文本文件要比较。第一个文件包含独特的项目,第二个文件包含相同的项目但重复了多次。我想看看第二个文件中每一行重复了多少次。这是我写的:
import os
import sys
f1 = open('file1.txt') # this has the 27 unique lines,
f1data = f1.readlines()
f2 = open('file2.txt') # this has lines repeated various times, with a total of 11162 lines
f2data = f2.readlines()
sys.stdout = open("linecount.txt", "w")
for line1 in f1data:
linecount = 0
for line2 in f2data:
if line1 in line2:
linecount+=1
print line2, crime
问题是,当我将行数加起来时,结果是 returns 11586,而不是 11162。行数增加的原因是什么?
是否有另一种使用 Python 获得线路频率输出的方法?
https://docs.python.org/2.7/reference/expressions.html#in:
For the Unicode and string types, x in y
is true if and only if x is a substring of y.
而不是
if line1 in line2:
我想你是想写
if line1 == line2:
或者替换整个
for line2 in f2data:
if line1 in line2:
linecount+=1
阻止
if line1 in f2data:
linecount += 1
即使我们稍微更改一下代码,它也不起作用。我从这段代码中得到了一些更好的结果。
>> import os
>> import sys
>> f1 = open('hmd4.csv')
>> f2 = open('svm_words.txt')
>> linecount = 0
>> for word1 in f1.read().split("."):
>> for word2 in f2.read().split("\n"):
>> if word1 in word2:
>> linecount+=1
>> print (linecount)
我有两个文本文件要比较。第一个文件包含独特的项目,第二个文件包含相同的项目但重复了多次。我想看看第二个文件中每一行重复了多少次。这是我写的:
import os
import sys
f1 = open('file1.txt') # this has the 27 unique lines,
f1data = f1.readlines()
f2 = open('file2.txt') # this has lines repeated various times, with a total of 11162 lines
f2data = f2.readlines()
sys.stdout = open("linecount.txt", "w")
for line1 in f1data:
linecount = 0
for line2 in f2data:
if line1 in line2:
linecount+=1
print line2, crime
问题是,当我将行数加起来时,结果是 returns 11586,而不是 11162。行数增加的原因是什么?
是否有另一种使用 Python 获得线路频率输出的方法?
https://docs.python.org/2.7/reference/expressions.html#in:
For the Unicode and string types,
x in y
is true if and only if x is a substring of y.
而不是
if line1 in line2:
我想你是想写
if line1 == line2:
或者替换整个
for line2 in f2data:
if line1 in line2:
linecount+=1
阻止
if line1 in f2data:
linecount += 1
即使我们稍微更改一下代码,它也不起作用。我从这段代码中得到了一些更好的结果。
>> import os
>> import sys
>> f1 = open('hmd4.csv')
>> f2 = open('svm_words.txt')
>> linecount = 0
>> for word1 in f1.read().split("."):
>> for word2 in f2.read().split("\n"):
>> if word1 in word2:
>> linecount+=1
>> print (linecount)