sed中的多行处理
multiline processing in sed
有人能告诉我如何从这样的输入中获取信息吗:
GOR - USD:
Buy 24.2000 1 200 +380 (98) 578-2574 Busy
Sell 25.0000 20 000 +380 (99) 444-4426 Morn
Sell 25.1000 17 500 +380 (98) 200-3003 Alex
.
.
GOR - EUR:
Sell 25.1000 17 500 +380 (98) 200-3003 Moy
Buy 24.2000 1 200 +380 (98) 578-2874 Jet
Sell 25.0000 20 000 +380 (99) 444-4126 Wet
Sell 25.0000 20 000 +380 (99) 444-4226 Pet
Sell 26.0000 20 000 +380 (99) 444-1226 Peter
输出如下:
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
.
.
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
GOR - 美元、GOR - 欧元、卖出、买入 - 是变量。
我知道这与您的要求不完全相同,但我想我会提供一种在 Perl 中执行此操作的方法 - 这是我非常喜欢的一种解析和处理文本的方法。 (您可以像 sed
一样使用它,但可以做更多)。
我们使用正则表达式 'detect' header 行并捕获它,然后我们将每个 other 行打印为前缀。
#!/usr/bin/perl
use strict;
use warnings;
my $header;
while ( my $line = <DATA> ) {
chomp $line;
if ( $line =~ m/\w{3} - \w{3}:/ ) {
$header = $line;
}
else {
print $header . $line,"\n";
}
}
__DATA__
GOR - USD:
Buy 24.2000 1 200 +380 (98) 578-2574 Busy
Sell 25.0000 20 000 +380 (99) 444-4426 Morn
Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - EUR:
Sell 25.1000 17 500 +380 (98) 200-3003 Moy
Buy 24.2000 1 200 +380 (98) 578-2874 Jet
Sell 25.0000 20 000 +380 (99) 444-4126 Wet
Sell 25.0000 20 000 +380 (99) 444-4226 Pet
Sell 26.0000 20 000 +380 (99) 444-1226 Peter
使用sed
$ sed -r '/:/{h;d}; G; s/(.*)\n(.*)/ /' file
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - USD: .
GOR - USD: .
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
工作原理:
/:/{h;d}
任何包含冒号的行都会保存到 hold-space.
G; s/(.*)\n(.*)/ /
对于所有其他行,我们将保留 space 附加到该行,然后交换顺序,以便首先打印保留 space 中的内容。
对于MacOSX或其他BSD系统,尝试:
sed -E -e '/:/{h;d}' -e G -e 's/(.*)\n(.*)/ /' file
使用awk
$ awk '/:/{hdr=[=12=];next} {print hdr,[=12=]}' file
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - USD: .
GOR - USD: .
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
工作原理:
/:/{hdr=[=17=];next}
任何包含冒号的行都保存在变量 hdr
中。然后我们跳到下一行。
print hdr,[=19=]
对于所有其他行,我们打印 header 后跟行。
假设样本输入中只是句点的行实际上并不存在,但旨在指示与它们周围的行类似的后续行:
$ awk 'NF>3{print hdr, [=10=]; next} {hdr=[=10=]}' file
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
您可以在 awk
中使用关联数组:
awk '!/:/{a[[=10=]]=currency} /:/{currency=[=10=]}END{for(i in a){ print a[i],i }}' file
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
有人能告诉我如何从这样的输入中获取信息吗:
GOR - USD:
Buy 24.2000 1 200 +380 (98) 578-2574 Busy
Sell 25.0000 20 000 +380 (99) 444-4426 Morn
Sell 25.1000 17 500 +380 (98) 200-3003 Alex
.
.
GOR - EUR:
Sell 25.1000 17 500 +380 (98) 200-3003 Moy
Buy 24.2000 1 200 +380 (98) 578-2874 Jet
Sell 25.0000 20 000 +380 (99) 444-4126 Wet
Sell 25.0000 20 000 +380 (99) 444-4226 Pet
Sell 26.0000 20 000 +380 (99) 444-1226 Peter
输出如下:
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
.
.
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
GOR - 美元、GOR - 欧元、卖出、买入 - 是变量。
我知道这与您的要求不完全相同,但我想我会提供一种在 Perl 中执行此操作的方法 - 这是我非常喜欢的一种解析和处理文本的方法。 (您可以像 sed
一样使用它,但可以做更多)。
我们使用正则表达式 'detect' header 行并捕获它,然后我们将每个 other 行打印为前缀。
#!/usr/bin/perl
use strict;
use warnings;
my $header;
while ( my $line = <DATA> ) {
chomp $line;
if ( $line =~ m/\w{3} - \w{3}:/ ) {
$header = $line;
}
else {
print $header . $line,"\n";
}
}
__DATA__
GOR - USD:
Buy 24.2000 1 200 +380 (98) 578-2574 Busy
Sell 25.0000 20 000 +380 (99) 444-4426 Morn
Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - EUR:
Sell 25.1000 17 500 +380 (98) 200-3003 Moy
Buy 24.2000 1 200 +380 (98) 578-2874 Jet
Sell 25.0000 20 000 +380 (99) 444-4126 Wet
Sell 25.0000 20 000 +380 (99) 444-4226 Pet
Sell 26.0000 20 000 +380 (99) 444-1226 Peter
使用sed
$ sed -r '/:/{h;d}; G; s/(.*)\n(.*)/ /' file
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - USD: .
GOR - USD: .
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
工作原理:
/:/{h;d}
任何包含冒号的行都会保存到 hold-space.
G; s/(.*)\n(.*)/ /
对于所有其他行,我们将保留 space 附加到该行,然后交换顺序,以便首先打印保留 space 中的内容。
对于MacOSX或其他BSD系统,尝试:
sed -E -e '/:/{h;d}' -e G -e 's/(.*)\n(.*)/ /' file
使用awk
$ awk '/:/{hdr=[=12=];next} {print hdr,[=12=]}' file
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - USD: .
GOR - USD: .
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
工作原理:
/:/{hdr=[=17=];next}
任何包含冒号的行都保存在变量
hdr
中。然后我们跳到下一行。print hdr,[=19=]
对于所有其他行,我们打印 header 后跟行。
假设样本输入中只是句点的行实际上并不存在,但旨在指示与它们周围的行类似的后续行:
$ awk 'NF>3{print hdr, [=10=]; next} {hdr=[=10=]}' file
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
您可以在 awk
中使用关联数组:
awk '!/:/{a[[=10=]]=currency} /:/{currency=[=10=]}END{for(i in a){ print a[i],i }}' file
GOR - USD: Sell 25.0000 20 000 +380 (99) 444-4426 Morn
GOR - EUR: Buy 24.2000 1 200 +380 (98) 578-2874 Jet
GOR - EUR: Sell 26.0000 20 000 +380 (99) 444-1226 Peter
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4126 Wet
GOR - EUR: Sell 25.1000 17 500 +380 (98) 200-3003 Moy
GOR - USD: Buy 24.2000 1 200 +380 (98) 578-2574 Busy
GOR - EUR: Sell 25.0000 20 000 +380 (99) 444-4226 Pet
GOR - USD: Sell 25.1000 17 500 +380 (98) 200-3003 Alex