Shell |删除重复行
Shell | removing repetitive lines
我需要编写一个 Bash 脚本来从输出文件中删除相似的行。
我的输出文件总是一样的。
第 1 行和第 2 行应保留,其他与这两行类似的行需要删除。
1: </UsageData><?xml version="1.0" encoding="UTF-8"?>
2: <UsageData broadcastday="2016-03-16">
日期不同。
最后一行应该保留。例如
</UsageData>
我是 shell 编程的新手,我不知道该怎么做。
这是我的样本 XML:
<?xml version="1.0" encoding="UTF-8"?>
<UsageData broadcastday="2016-03-16">
<Hh hhID="48800301">
<Inst instID="000002B9"/>
<Live>
<Station>516</Station>
<From>Wed Mar 16 2016 09:52:47 GMT+0000 (UTC)</From>
<DurSec>58077</DurSec>
<Viewer>
<HhMem>569de65c9c3ab0cf7bfa2df2</HhMem>
</Viewer>
</Live>
</Hh>
<Hh hhID="46920403">
<Inst instID="000002A8"/>
<Live>
<Station>5000</Station>
<From>Wed Mar 16 2016 12:42:17 GMT+0000 (UTC)</From>
<DurSec>47908</DurSec>
<Viewer>
<HhMem>56caee95f915e09335fd976f</HhMem>
</Viewer>
</Live>
</Hh>
</UsageData><?xml version="1.0" encoding="UTF-8"?>
<UsageData broadcastday="2016-03-16">
<Hh hhID="15260304">
<Inst instID="000000A5"/>
<Live>
<Station>5000</Station>
<From>Wed Mar 16 2016 12:57:48 GMT+0000 (UTC)</From>
<DurSec>28814</DurSec>
<Viewer>
<HhMem>565f181dd830d3cc7057c0b9</HhMem>
</Viewer>
</Live>
</Hh>
</UsageData><?xml version="1.0" encoding="UTF-8"?>
<UsageData broadcastday="2016-03-16">
<Hh hhID="50100501">
<Inst instID="0000022D"/>
<Live>
<Station>560</Station>
<From>Wed Mar 16 2016 14:21:19 GMT+0000 (UTC)</From>
<DurSec>41967</DurSec>
<Viewer>
<HhMem>56c4412de6a8ff4da18fd4ae</HhMem>
<HhMem>56c4412de6a8ff4da18fd4cb</HhMem>
</Viewer>
</Live>
</Hh>
</UsageData><?xml version="1.0" encoding="UTF-8"?>
<UsageData broadcastday="2016-03-16">
<Hh hhID="36110404">
<Inst instID="00000104"/>
<Live>
<Station>545</Station>
<From>Wed Mar 16 2016 15:01:04 GMT+0000 (UTC)</From>
<DurSec>671</DurSec>
<Viewer>
<HhMem>568ce8acbd0e486a951d41ce</HhMem>
<HhMem>568ce8acbd0e486a951d41dc</HhMem>
<HhMem>568ce8acbd0e486a951d41c5</HhMem>
</Viewer>
</Live>
</Hh>
</UsageData>
我用非常简单的方法解决了我的问题。
awk '/</UsageData><\?xml version="1.0" encoding="UTF-8"\?>/
{getline; next}1' file
我需要编写一个 Bash 脚本来从输出文件中删除相似的行。 我的输出文件总是一样的。
第 1 行和第 2 行应保留,其他与这两行类似的行需要删除。
1: </UsageData><?xml version="1.0" encoding="UTF-8"?>
2: <UsageData broadcastday="2016-03-16">
日期不同。
最后一行应该保留。例如
</UsageData>
我是 shell 编程的新手,我不知道该怎么做。
这是我的样本 XML:
<?xml version="1.0" encoding="UTF-8"?> <UsageData broadcastday="2016-03-16"> <Hh hhID="48800301"> <Inst instID="000002B9"/> <Live> <Station>516</Station> <From>Wed Mar 16 2016 09:52:47 GMT+0000 (UTC)</From> <DurSec>58077</DurSec> <Viewer> <HhMem>569de65c9c3ab0cf7bfa2df2</HhMem> </Viewer> </Live> </Hh> <Hh hhID="46920403"> <Inst instID="000002A8"/> <Live> <Station>5000</Station> <From>Wed Mar 16 2016 12:42:17 GMT+0000 (UTC)</From> <DurSec>47908</DurSec> <Viewer> <HhMem>56caee95f915e09335fd976f</HhMem> </Viewer> </Live> </Hh> </UsageData><?xml version="1.0" encoding="UTF-8"?> <UsageData broadcastday="2016-03-16"> <Hh hhID="15260304"> <Inst instID="000000A5"/> <Live> <Station>5000</Station> <From>Wed Mar 16 2016 12:57:48 GMT+0000 (UTC)</From> <DurSec>28814</DurSec> <Viewer> <HhMem>565f181dd830d3cc7057c0b9</HhMem> </Viewer> </Live> </Hh> </UsageData><?xml version="1.0" encoding="UTF-8"?> <UsageData broadcastday="2016-03-16"> <Hh hhID="50100501"> <Inst instID="0000022D"/> <Live> <Station>560</Station> <From>Wed Mar 16 2016 14:21:19 GMT+0000 (UTC)</From> <DurSec>41967</DurSec> <Viewer> <HhMem>56c4412de6a8ff4da18fd4ae</HhMem> <HhMem>56c4412de6a8ff4da18fd4cb</HhMem> </Viewer> </Live> </Hh> </UsageData><?xml version="1.0" encoding="UTF-8"?> <UsageData broadcastday="2016-03-16"> <Hh hhID="36110404"> <Inst instID="00000104"/> <Live> <Station>545</Station> <From>Wed Mar 16 2016 15:01:04 GMT+0000 (UTC)</From> <DurSec>671</DurSec> <Viewer> <HhMem>568ce8acbd0e486a951d41ce</HhMem> <HhMem>568ce8acbd0e486a951d41dc</HhMem> <HhMem>568ce8acbd0e486a951d41c5</HhMem> </Viewer> </Live> </Hh> </UsageData>
我用非常简单的方法解决了我的问题。
awk '/</UsageData><\?xml version="1.0" encoding="UTF-8"\?>/ {getline; next}1' file