如何删除 bash 中包含任何匹配文本的行

How to remove lines containing any matching text in bash

我有一个文本文件。它看起来像这样:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: .18 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 2500: 3.17 :
Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 5000: 0.00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 250: .25 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 500: .00 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 1000: .08 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 2500: .33 :
Business Card: 2x3.5: 14pt Premium Uncoated Cover: Color front B&W back: 5000: 4.33 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: 6.23 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 500: 9.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 1000: 6.17 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 2500: 7.58 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 5000: 2.72 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 250: 8.70 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 500: 4.50 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 1000: 1.13 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 2500: 2.53 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Both Sides: 5000: 5.63 :

所以我有 Business Cards,我有 Door Hanger。每一个都是一个项目,但为了计算它们,我需要删除它们的所有其他出现。

所以最后,文件是这样的:

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: .18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: 6.23 :

我必须在不指定确切名称的情况下执行此操作,也就是说我不能 运行 sed 专门针对 Business CardDoor Hanger 的出现。我只需要删除所有包含任何相似之处的行,而不仅仅是完全重复的行。

谢谢

根据您的评论,一个简单的方法是:

cat filename | awk -F ":" '{print }' | sort | uniq

使用 awk 可以做到这一点:

awk -F":" '!=k{print [=10=]}{k=}' file.txt

Business Card: 2x3.5: Recycled 100lb Dull Cover with Matte Finish(Rounded Corners): Color front No Back: 1000: .18 :
Door Hanger: 3.5x8.5: 100lb Gloss Book with Aqueous Coating (C2S): Color Front No Back: 250: 6.23 :

测试第一个字段是否等于上一行的字段。如果相等,什么都不做,就保存它 (k=),如果不相等,则打印该行。

这可以缩短为:

awk -F: '!seen[]++' file.txt

(感谢 JID 和格伦杰克曼)

或者,如果您有固定的列数,您可以这样做:

rev file.txt | uniq -f 17 | rev

反转文件的每一行并跳过第 17 列以在最后一列(实际上是第一列)上应用 uniq,然后反转回来。但是这里不是很方便,因为您没有相同数量的列。

HTH