使用 Sed / Awk 在 UNIX 中合并两行
Merge two rows in UNIX using Sed / Awk
考虑 UNIX 中的源文件,其中包含以下竖线分隔的行。此示例有五行。第 1、2 和 4 行很好,但第 3 和 5 行由于文本中的换行符而分成两行。我必须通过仅在 t 处删除新行将第 3 行合并为单行,将第 5 行合并为单行,然后加载到 oracle table.
如何使用 sed / awk 或任何其他 UNIX 命令来实现?
输入示例:
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES
(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES
(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
期望的输出:
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
通过perl,
perl -00pe 's/\n(?!\h*\d)//g' file
或
$ perl -00pe 's/\n(?=\()//g' file
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
awk 也可以做到
awk '{if(!match([=10=],"[0-9]\. ")){print prev[=10=]}else{print [=10=]}; prev=[=10=]}' file
看来每行应该有 7 个字段:
awk -F'|' '
{[=10=] = prev [=10=]}
NF < 7 {prev = [=10=]}
NF == 7 {print; prev=""}
' file
但实际上,您应该使用合适的 CSV 解析器:
perl -MText::CSV -Mautodie -E '
$csv = Text::CSV->new({binary => 1, sep_char => "|", quote_space => 0});
open $fh, "<", shift;
while ($row = $csv->getline($fh)) {
$csv->combine( map {s/\n//g; $_} @$row );
say $csv->string();
}
' file
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES (ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES (ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
使用 GNU awk 进行多字符 RS:
$ awk -v RS='^$' -v ORS= '{gsub(/\s*\n\(/,"(")}1' file
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
考虑 UNIX 中的源文件,其中包含以下竖线分隔的行。此示例有五行。第 1、2 和 4 行很好,但第 3 和 5 行由于文本中的换行符而分成两行。我必须通过仅在 t 处删除新行将第 3 行合并为单行,将第 5 行合并为单行,然后加载到 oracle table.
如何使用 sed / awk 或任何其他 UNIX 命令来实现?
输入示例:
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES
(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES
(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
期望的输出:
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
通过perl,
perl -00pe 's/\n(?!\h*\d)//g' file
或
$ perl -00pe 's/\n(?=\()//g' file
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
awk 也可以做到
awk '{if(!match([=10=],"[0-9]\. ")){print prev[=10=]}else{print [=10=]}; prev=[=10=]}' file
看来每行应该有 7 个字段:
awk -F'|' '
{[=10=] = prev [=10=]}
NF < 7 {prev = [=10=]}
NF == 7 {print; prev=""}
' file
但实际上,您应该使用合适的 CSV 解析器:
perl -MText::CSV -Mautodie -E '
$csv = Text::CSV->new({binary => 1, sep_char => "|", quote_space => 0});
open $fh, "<", shift;
while ($row = $csv->getline($fh)) {
$csv->combine( map {s/\n//g; $_} @$row );
say $csv->string();
}
' file
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES (ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES (ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
使用 GNU awk 进行多字符 RS:
$ awk -v RS='^$' -v ORS= '{gsub(/\s*\n\(/,"(")}1' file
1. 9187-001|COS 60W 16G T1A CLV|||||10
2. 9184-002|COS 48W 28G NT SKO|FOOTAGE/SEQUENCE GRIDS||||10
3. 9679-229|COS 56G 40G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10
4. 9184-230|COS48W 48G NT LIF SKO|LIFE STORE COSMETIC FOOTAGE/SEQUENCE GRID||||10
5. 9679-230|COS 56G 44G NT SVO|"FOOTAGE/SEQUENCE GRIDS FOR STREETSCAPE STORES(ALL COSMETICS ON 60"" HIGH GONDOLAS"||||10