如何在不按 bash 排序的情况下查找列中的唯一元素？

Question

我正在尝试使用 bash 在数据文件的一列（具体为第 2 列）中查找元素的唯一出现。我不希望输出被排序或随机化。经过大量搜索，我找到了一个基于 'awk' 的解决方案，该解决方案部分有效：

awk '{arr[] = 1} END {for (key in arr) {print key}}' input_file > output_file

但输出似乎是随机的。我希望以这样的方式执行此操作，即对于每个元素，检查它的最后一次出现。或者换句话说 'uniqueness' 从文件末尾开始检查。例如，如果元素按以下顺序排列：

5, 6, 7, 5, 6, 8, 5, 6, 9, 6, 9, 10, 10, 11, 10, 11, 12

那么输出应该是：

7, 8, 5, 6, 9, 10, 11, 12

Answer 1

能否请您尝试以下。我在出租车里所以无法测试它应该工作。

awk '!a[]++{b[++count]=} END{for(j=1;j<=count;j++){print b[j]}}' Input_file

这应该以正确的顺序输出，其中第二个字段进入 Input_file + 这将处理该字段的唯一性。

说明： 在此处添加对上述代码的说明。

awk '                           ##Starting awk program here.
!a[]++{                       ##Checking condition if  is present in array a if NOT present then increase counter in index of array a with .
  b[++count]=                 ##Creating an array named b whose index is count variable with its increment count and its value is .
}
END{                            ##Starting END block of this awk program here.
  for(j=1;j<=count;j++){        ##Starting for loop here from j=1 to till value of count.
    print b[j]                  ##Printing value of array b with index of variable j here which is  of lines.
  }                             ##Closing BLOCK for for loop here.
}
'  Input_file                   ##Mentioning Input_file name here.

Answer 2

两次读取文件的方法：-

awk 'NR==FNR{++A[];next}A[]==++T[]' input_file input_file

如何在不按 bash 排序的情况下查找列中的唯一元素？

How to finding unique elements in a column without sorting in bash?

bash

unique

columnsorting