为什么 diff 实用程序在结果文件中显示相似的文本？

Question

我正在使用 diff 查找两个文本文件之间的差异。它工作得很好，但是，当我更改文本文件中行的顺序时，它会在结果文件中显示相似的文本。

这里是file1.txt:

>gi17
AAAAAA
>gi30
BBBBBB
>gi40
CCCCCC
>gi92
DDDDDD
>gi50
EEEEEE
>gi81
FFFFFF

File2.txt

>gi40
CCCCCC
>gi01
BBBBBB
>gi02
AAAAAA
>gi30
BBBBBB

Result.txt:

>gi17
AAAAAA
>gi30        ???
BBBBBB       ???
>gi92
DDDDDD
>gi01
BBBBBB
>gi50
EEEEEE
>gi81
FFFFFF
>gi02
AAAAAA
>gi30        ???
BBBBBB       ???

差异语句：

$ diff C:/Users/User/Desktop/File1.txt C:/Users/User/Desktop/File2.txt > C:/Users/User/Desktop/Result.txt

为什么显示

>gi30
BBBBBB

作为异类？

编辑 1： 我想要的是在整个文件2中搜索文件1中每一行的出现，因为这两个文件没有排序，我不能触摸它们（遗传数据）。

编辑 2： 我想从我的 php 代码执行加入命令。它运行在 cygwin cmd 应用程序中成功，但它没有运行来自我的 php

shell_exec("C:\cygwin64\bin\bash.exe --login -c 'join -v 1 <(sort $OldDatabaseFile.txt) <(sort $NewDatabaseFile.txt) > $text_files_path/DelSeqGi.txt 2>&1'");

谢谢。

Answer 1

正如 fedorqui 在评论中所说，diff 比较文件逐行。

要实现你想要的，你可以这样做:

comm -3 <(sort f1.txt) <(sort f2.txt) > result.txt

手册（相关部分）：

comm - compare two sorted files line by line

       -1     suppress column 1 (lines unique to FILE1)

       -2     suppress column 2 (lines unique to FILE2)

       -3     suppress column 3 (lines that appear in both files)


EXAMPLES
  comm -3 file1 file2
    Print lines in file1 not in file2, and vice versa.

Answer 2

要获取文件之间的差异，请使用 bash join util，如下所示：-

DESCRIPTION
     The join utility performs an ``equality join'' on the specified files and
     writes the result to the standard output.  The ``join field'' is the
     field in each file by which the files are compared.  The first field in
     each line is used by default.  There is one line in the output for each
     pair of lines in file1 and file2 which have identical join fields.  Each
     output line consists of the join field, the remaining fields from file1
     and then the remaining fields from file2.

 -v file_number
         Do not display the default output, but display a line for each
         unpairable line in file file_number.  The options -v 1 and -v 2
         may be specified at the same time.

 -1 field
         Join on the field'th field of file1.

 -2 field
         Join on the field'th field of file2.

join -v 1 <(sort file1.txt) <(sort file2.txt)     # To get the lines in file file1.txt which file file2.txt does not have
join -v 2 <(sort file1.txt) <(sort file2.txt)     # Vice Versa of above

原文answer/Credits:-

为什么 diff 实用程序在结果文件中显示相似的文本？

why diff utility show the similar text in result file?

unix

bash

diff