Bash 脚本尝试比较历史文件和当前文件，并在行不匹配时获取总和差异

Question

我有一个 bash 脚本，它使用 diff -y 和 awk 并将历史文件与当前文件进行比较。我运行遇到的问题是历史文件实际上总是比当前状态有更多的行数。并且当前文件中缺少哪一行会有所不同。下面是我希望输出的样子。

$ more Sum_difference.csv 

Historical=93   lag-123:1234  Current=92   lag-123:1234  |  Difference=  1

Historical=53   lag-133:2345  Current=52   lag-133:2345  |  Difference=  1

Historical=188  lag-144:3546  Current=189  lag-104:3654  |  Difference=  -1

Historical=106  lag-157:3457  Current=105  lag-157:3457  |  Difference=  1

Historical=133  lag-167:3458  Current=132  lag-167:3458  |  Difference=  1

Historical=8    lag-168:4657  Current=7    lag-168:4657  |  Difference=  1

Historical=168  lag-170:4566  Current=167  lag-170:4566  |  Difference=  1

Historical=96   lag-171:4568  Current=98   lag-171:4568  |  Difference=  -2

Historical=30   lag-172:4570  Current=31   lag-172:4570  |  Difference=  -1

虽然我得到的通常与此不同。示例 cat historical.csv | wc -l 将等于 678，而当前可能只显示 500 行。导致输出看起来像下面的例子。导致差异不正确。

$ more Sum_difference.csv 

Historical=93   lag-123:1234  Current=92   lag-123:1234  |  Difference=  1

Historical=53   lag-133:2345  Current=52   lag-133:2345  |  Difference=  1

Historical=188  lag-144:3546  Current=189  lag-104:3654  |  Difference=  -1

Historical=133  lag-167:3458  Current=105  lag-157:3457  |  Difference=  28

Historical=96   lag-171:4568  Current=132  lag-167:3458  |  Difference=  -36

Historical=30   lag-172:4570  Current=31   lag-172:4570  |  Difference=  -1

所以在下面的示例中，如果历史记录中有一个条目未在当前列表中列出，它会抛出行数，从而关闭计数，这反过来又会抛出我的总和差。我一直在试图找出一种方法来解决这个问题，使其可能像下面的示例一样输出。我已经尝试使用 comm 、 diff 、 sdiff 来做到这一点，下面是我想要完成的一个例子。

$ more Sum_difference.csv 

Historical=93   lag-123:1234  Current=92   lag-123:1234  |  Difference=  1

Historical=53   lag-133:2345  Current=52   lag-133:2345  |  Difference=  1

Historical=188  lag-144:3546  Current=189  lag-104:3654  |  Difference=  -1

Historical=106  lag-157:3457                          Not Present   |  Not present <<<< 

Historical=133  lag-167:3458  Current=132  lag-167:3458  |  Difference=  1

Historical=8    lag-168:4657               Not Present   |  Not present <<<< 

Historical=168  lag-170:4566  Current=167  lag-170:4566  |  Difference=  1

Historical=96   lag-171:4568  Current=98   lag-171:4568  |  Difference=  -2

Historical=30   lag-172:4570  Current=31   lag-172:4570  |  Difference=  -1

我基本上做的是获取历史文件和当前文件并对输出进行排序并计算每个文件中的重复项，然后我需要比较这两个文件并获取每一行重复项数量的差异.历史文件传统上包含更多 lines/rows 当前文件导致它们不匹配。我用来对这两个文件进行排序的命令如下。

Current = grep lag | cut -d '"' -f2 | cut -d '.' -f1 | awk '{print $NF}' | sort | uniq -c

Historical = cut -c1-12 | sort | grep lag | uniq -c

重复项的排序和计数效果很好，只是如果一行在历史中而不是当前我需要插入一个空白 space 并在该行中插入类似 "not present" 的内容它通常位于当前文件中。我只是不确定该怎么做。

有没有一种方法可以“cat historical grep current and if matching field is not present to add a space or word to fill the space。这可以用 sed 完成吗？谢谢感谢大家，我很感激我能得到的任何帮助。如果这已经冗长，我深表歉意。

Answer 1

awk 救援！

由于您没有 post 输入文件，我将处理一个示例并展示我认为您的问题的解决方案。

每个文件都有一些重复键

==> file1 <==
a
a
a
b
b
c
c
e

==> file2 <==
a
a
c
d
e
e

我们正在尝试比较计数，可能缺少键。尽管输入文件已排序，但并不需要如此；未指定输出顺序，也许您需要根据键（或差异）对其进行排序。

$ awk 'NR==FNR{a[]++; next} {b[]++} 
       END    {print "key","count1","count2","diff"; 
               for(k in a) {bk=(k in b)?b[k]:0; 
                            print k,a[k],bk,a[k]-bk; 
                            delete b[k]} 
               for(k in b) print k,0,b[k],-b[k]}' file1 file2 | 
  column -t

key  count1  count2  diff
a    3       2       1
b    2       0       2
c    2       1       1
e    1       2       -1
d    0       1       -1

Bash 脚本尝试比较历史文件和当前文件，并在行不匹配时获取总和差异

Bash script trying to compare a historical and current file and get a sum difference when lines dont match

bash

diff

awk

cut

multiple-columns