使用 bash、sed 或 awk 删除重复数据
Remove duplicate data using bash, sed or awk
如何使用批处理、sed 或 awk 搜索重复数据?
目标是从 data.txt 文件中删除重复的 "Changelist: XXXXX" 条目。
我有点卡住了,有人可以帮我吗?
请查看 output.txt 以获得所需的输出。
data.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
output.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
glen's output.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
====================================$sep
这实际上是 awk 的一项非常常见的任务
sep='====================================\n'
awk -F'\n' -v RS="$sep" -v ORS="$sep" '!seen[]++' data.txt > output.txt
在这里,我们使用 $sep
作为 awk record 分隔符来读取段落,换行符作为 field分隔符
!seen[]++
是一个表达式,仅对遇到此特定字段 1 的第一条记录为真。由于没有给出操作,默认操作是打印当前记录,并附加输出记录分隔符。
如何使用批处理、sed 或 awk 搜索重复数据? 目标是从 data.txt 文件中删除重复的 "Changelist: XXXXX" 条目。 我有点卡住了,有人可以帮我吗?
请查看 output.txt 以获得所需的输出。
data.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
output.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
CodeReview: Result: @result___
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview: Result:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
glen's output.txt
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
====================================
Changelist: 808273
Date: 2015/03/19
Developer: B
ShortDescr: Hello
CodeReview:
====================================
Changelist: 808271
Date: 2015/03/19
Developer: C
ShortDescr: HI
CodeReview:
====================================
Changelist: 808277
Date: 2015/03/19
Developer: D
ShortDescr: HEY
CodeReview:
====================================
Changelist: 808298
Date: 2015/03/19
Developer: A
ShortDescr: Checking in the following graphics:
CodeReview:
====================================$sep
这实际上是 awk 的一项非常常见的任务
sep='====================================\n'
awk -F'\n' -v RS="$sep" -v ORS="$sep" '!seen[]++' data.txt > output.txt
在这里,我们使用 $sep
作为 awk record 分隔符来读取段落,换行符作为 field分隔符
!seen[]++
是一个表达式,仅对遇到此特定字段 1 的第一条记录为真。由于没有给出操作,默认操作是打印当前记录,并附加输出记录分隔符。