为什么 diff 实用程序在结果文件中显示相似的文本?
why diff utility show the similar text in result file?
我正在使用 diff 查找两个文本文件之间的差异。它工作得很好,但是,当我更改文本文件中行的顺序时,它会在结果文件中显示相似的文本。
这里是file1.txt:
>gi17
AAAAAA
>gi30
BBBBBB
>gi40
CCCCCC
>gi92
DDDDDD
>gi50
EEEEEE
>gi81
FFFFFF
File2.txt
>gi40
CCCCCC
>gi01
BBBBBB
>gi02
AAAAAA
>gi30
BBBBBB
Result.txt:
>gi17
AAAAAA
>gi30 ???
BBBBBB ???
>gi92
DDDDDD
>gi01
BBBBBB
>gi50
EEEEEE
>gi81
FFFFFF
>gi02
AAAAAA
>gi30 ???
BBBBBB ???
差异语句:
$ diff C:/Users/User/Desktop/File1.txt C:/Users/User/Desktop/File2.txt > C:/Users/User/Desktop/Result.txt
为什么显示
>gi30
BBBBBB
作为异类?
编辑 1:
我想要的是在整个文件2中搜索文件1中每一行的出现,因为这两个文件没有排序,我不能触摸它们(遗传数据)。
编辑 2:
我想从我的 php 代码执行加入命令。它 运行 在 cygwin cmd 应用程序中成功,但它没有 运行 来自我的 php
shell_exec("C:\cygwin64\bin\bash.exe --login -c 'join -v 1 <(sort $OldDatabaseFile.txt) <(sort $NewDatabaseFile.txt) > $text_files_path/DelSeqGi.txt 2>&1'");
谢谢。
正如 fedorqui 在评论中所说,diff 比较文件逐行。
要实现你想要的,你可以这样做:
comm -3 <(sort f1.txt) <(sort f2.txt) > result.txt
手册(相关部分):
comm - compare two sorted files line by line
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
EXAMPLES
comm -3 file1 file2
Print lines in file1 not in file2, and vice versa.
要获取文件之间的差异,请使用 bash
join
util,如下所示:-
DESCRIPTION
The join utility performs an ``equality join'' on the specified files and
writes the result to the standard output. The ``join field'' is the
field in each file by which the files are compared. The first field in
each line is used by default. There is one line in the output for each
pair of lines in file1 and file2 which have identical join fields. Each
output line consists of the join field, the remaining fields from file1
and then the remaining fields from file2.
-v file_number
Do not display the default output, but display a line for each
unpairable line in file file_number. The options -v 1 and -v 2
may be specified at the same time.
-1 field
Join on the field'th field of file1.
-2 field
Join on the field'th field of file2.
join -v 1 <(sort file1.txt) <(sort file2.txt) # To get the lines in file file1.txt which file file2.txt does not have
join -v 2 <(sort file1.txt) <(sort file2.txt) # Vice Versa of above
原文answer/Credits:-
我正在使用 diff 查找两个文本文件之间的差异。它工作得很好,但是,当我更改文本文件中行的顺序时,它会在结果文件中显示相似的文本。
这里是file1.txt:
>gi17
AAAAAA
>gi30
BBBBBB
>gi40
CCCCCC
>gi92
DDDDDD
>gi50
EEEEEE
>gi81
FFFFFF
File2.txt
>gi40
CCCCCC
>gi01
BBBBBB
>gi02
AAAAAA
>gi30
BBBBBB
Result.txt:
>gi17
AAAAAA
>gi30 ???
BBBBBB ???
>gi92
DDDDDD
>gi01
BBBBBB
>gi50
EEEEEE
>gi81
FFFFFF
>gi02
AAAAAA
>gi30 ???
BBBBBB ???
差异语句:
$ diff C:/Users/User/Desktop/File1.txt C:/Users/User/Desktop/File2.txt > C:/Users/User/Desktop/Result.txt
为什么显示
>gi30
BBBBBB
作为异类?
编辑 1: 我想要的是在整个文件2中搜索文件1中每一行的出现,因为这两个文件没有排序,我不能触摸它们(遗传数据)。
编辑 2: 我想从我的 php 代码执行加入命令。它 运行 在 cygwin cmd 应用程序中成功,但它没有 运行 来自我的 php
shell_exec("C:\cygwin64\bin\bash.exe --login -c 'join -v 1 <(sort $OldDatabaseFile.txt) <(sort $NewDatabaseFile.txt) > $text_files_path/DelSeqGi.txt 2>&1'");
谢谢。
正如 fedorqui 在评论中所说,diff 比较文件逐行。
要实现你想要的,你可以这样做:
comm -3 <(sort f1.txt) <(sort f2.txt) > result.txt
手册(相关部分):
comm - compare two sorted files line by line
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
-3 suppress column 3 (lines that appear in both files)
EXAMPLES
comm -3 file1 file2
Print lines in file1 not in file2, and vice versa.
要获取文件之间的差异,请使用 bash
join
util,如下所示:-
DESCRIPTION
The join utility performs an ``equality join'' on the specified files and
writes the result to the standard output. The ``join field'' is the
field in each file by which the files are compared. The first field in
each line is used by default. There is one line in the output for each
pair of lines in file1 and file2 which have identical join fields. Each
output line consists of the join field, the remaining fields from file1
and then the remaining fields from file2.
-v file_number
Do not display the default output, but display a line for each
unpairable line in file file_number. The options -v 1 and -v 2
may be specified at the same time.
-1 field
Join on the field'th field of file1.
-2 field
Join on the field'th field of file2.
join -v 1 <(sort file1.txt) <(sort file2.txt) # To get the lines in file file1.txt which file file2.txt does not have
join -v 2 <(sort file1.txt) <(sort file2.txt) # Vice Versa of above
原文answer/Credits:-