如何在忽略换行符和 bash 之间的 space 的同时捕获字符串匹配和新行之间的字符串?

How do I capture strings between a string match and a new line while ignoring newline and space in between in bash?

我有一堆清单文件,我正在尝试迭代以从中提取 Import-Packages。 Import-Package 是新行分隔的,后面跟着一个 space 用于所有连续的包导入,直到导入语句结束。 然后是一个新行,下一个属性(在本例中为 uri)没有 space。我只需要读取导入包属性,即导入包后跟所有换行符,然后是 space 模式。

示例清单导入语句如下所示

Bnd-LastModified: 1494408636933
Bundle-ManifestVersion: 2
Import-Package: com.advantco.base,com.advantco.base.logging,com.advant
 co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu
 tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.
 oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,
 com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com
 .advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta
 data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.
 auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c
 ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar
 crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co
 m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r
 est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c
 ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc
 rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr
 ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja
 vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.
 transform.stream,org.apache.commons.codec.binary,org.apache.commons.c
 ollections4.map,org.apache.commons.httpclient,org.apache.commons.http
 client.util,org.json
Require-Capability: osgi.ee;filter:="(&(osgi.ee=JavaSE)(version=1.6))"
Tool: Bnd-3.3.0.201609221906
Export-Package: com.advantco.sugarcrm.core;uses:="com.advantco.base.lo
 gging,com.advantco.sugarcrm.core.object";version="1.0.0",com.advantco
 .sugarcrm.core.adapter;uses:="com.advantco.base,com.advantco.base.log
 ging,com.advantco.base.net,com.advantco.base.variablesubstitution,com
 .advantco.sugarcrm.core,com.advantco.sugarcrm.core.error,com.advantco
 .sugarcrm.core.object,com.advantco.sugarcrm.core.object.metadata";ver
 sion="1.0.0",com.advantco.sugarcrm.core.error;version="1.0.0",com.adv
 antco.sugarcrm.core.iface;uses:="com.advantco.sugarcrm.core.error,com
 .advantco.sugarcrm.core.object";version="1.0.0",com.advantco.sugarcrm
 .core.object;uses:="com.advantco.base,com.advantco.base.mime,com.adva
 ntco.base.net,com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.

Uri 或 Required Capability 或 Export Package 无法硬编码,可能是 Import-Package 之后的其他标志,所以我需要阅读所有行,包括 Import-Package 和所有新行后跟 space 跟随导入包,直到我得到一行后跟一个新属性字段而不是它前面的 space(不一定是给定的 Header)。

输出要像

Import-Package: com.advantco.base,com.advantco.base.logging,com.advant
 co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu
 tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.
 oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,
 com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com
 .advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta
 data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.
 auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c
 ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar
 crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co
 m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r
 est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c
 ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc
 rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr
 ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja
 vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.
 transform.stream,org.apache.commons.codec.binary,org.apache.commons.c
 ollections4.map,org.apache.commons.httpclient,org.apache.commons.http
 client.util,org.json

然后我可以删除新行看起来像

Import-Package:com.advantco.base,com.advantco.base.logging,com.advantco.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitution,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.metadata,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.core.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugarcrm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,com.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.rest.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.core.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarcrm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.crypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,javax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.transform.stream,org.apache.commons.codec.binary,org.apache.commons.collections4.map,org.apache.commons.httpclient,org.apache.commons.httpclient.util,org.json

我正在尝试这个,但它似乎适用于导入包后面的 header 是小案例的情况。 (这里是Import-Packge:package-names ……Require-Capability: 稍后但在某些情况下是Import-Packge: package-names …… url:然后被捕获。)

`sed -n -e '/Import-Package/,/[A-Z]/ p'` 

但是如果Manifest是这样的

Bnd-LastModified: 1494408636933
Bundle-ManifestVersion: 2
Import-Package: com.advantco.base,com.advantco.base.logging,com.advant
 co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu
 tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.
 oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,
 com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com
 .advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta
 data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.
 auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c
 ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar
 crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co
 m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r
 est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c
 ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc
 rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr
 ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja
 vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.
 transform.stream,org.apache.commons.codec.binary,org.apache.commons.c
 ollections4.map,org.apache.commons.httpclient,org.apache.commons.http
 client.util,org.json
url:http://sample.org

然后 sample.org 也被捕获。

编辑: 因为 OP 告诉 uri 字符串不应该被硬编码,所以现在添加这个解决方案。

awk '
/Import-Package/{
  flag=1
  val=[=10=]
  next
}
flag && /^ / && NF{
  gsub(/^ /,"")
  val=val?val [=10=]:[=10=]
  next
}
flag && !/^ / && NF{
  print val
  flag=val=""
}'  Input_file

输出结果如下

Import-Package: com.advantco.base,com.advantco.base.logging,com.advantco.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitution,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.metadata,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.core.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugarcrm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,com.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.rest.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.core.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarcrm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.crypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,javax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.transform.stream,org.apache.commons.codec.binary,org.apache.commons.collections4.map,org.apache.commons.httpclient,org.apache.commons.httpclient.util,org.json


第一个解决方案: 考虑到您的实际 Input_file 与显示的示例相同,请您尝试以下操作。

awk '
/^uri/{
  flag=""
}
/^Import/{
  flag=1
}
flag{
  sub(/^ +/,"")
  val=val?val [=12=]:[=12=]
}
END{
  print val
}' Input_file

第二个解决方案: 在此处使用 RS 添加解决方案。

awk -v RS="uri:" 'FNR==1{gsub(/\n|\n +/,"");print}'  Input_file

第三个解决方案: 在此处同时使用 RSFS

awk -v RS="" -v FS="uri:" '{gsub(/\n|\n +/,"",);print }'  Input_file

第 4 个解决方案: 使用 match 关键字和 awk.[=24= 再添加 1 个解决方案]

awk -v RS=""  -v FS="\n" 'match([=15=],/Import.*uri/){val=substr([=15=],RSTART,RLENGTH);gsub(/\n|\n +|uri$/,"",val);print val}' Input_file

注意: 如果您只有 1 次打印此类行,则可以在之后添加 exit print 以上代码的声明也是如此。

我的假设:

  • "Import-Package:" 行可能从文件中间开始。
  • 下一个属性并不总是 "uri"。

那么:

awk '/^Import-Package:/,!/^Import-Package:/&&!/^ / {
     if (!line || sub(/^ /, "")) line = line [=10=]}
     END {print line}
' sample.txt

它从 "Import-Package:" 行读取到下一个属性的行(被丢弃),通过删除前导空格连接这些行。

很多 awk 回复,但这在 sed 中也完全可行。

If you just want the block printed as-is:

$: sed -n '
 /^Import-Package: /,/^[^ ]/ {
    /^Import-Package:/ p;
    /^ / p;
 }
' infile

在 GNU 中可以全部堆叠在一行中 sed

$: sed -n '/^Import-Package: /,/^[^ ]/ { /^Import-Package:/ p; /^ / p; }' infile

已解释

$: sed -n ' ...     ' infile

使用 sed-n 来防止任何输出,除非通过明确的命令;从(在此示例中)名为 infile 的文件中读取,根据需要进行调整。在单引号内,程序显示为:

 /^Import-Package: /,/^[^ ]/ {
    /^Import-Package:/ p; 
    /^ / p;
 }

从以 Import-Package: 开头的任何行开始,并继续以任何非 space 开头的任何后续行(此处,明确地 space 字符),执行从这个开始花括号开始直到匹配的闭花括号的所有命令。

在该块中,对于以 Import-Package: 开头的任何行,打印它。对于任何以 space 开头的行,打印它。

没有命令在任何以非 space 开头但不是 Import-Package: 的行上打印,所以如果在它下面有另一个块开始,它不会打印那个一个,并且切换将超出范围,因此它不会打印任何其他内容,除非另一个 Import-Package: 块开始。

如果块结束文件,代码范围将永远不会超出范围,因此它将打印直到用完记录。

If you want it to print the block all on one line with spaces removed -

$: sed -n '
 /^Import-Package: /,/^[^ ]/ {
    /^Import-Package:/ { h; d; }
    /^ / H;
    /^[^ ]/ { s/.*//; x; s/\n* //g; p; d;   }
    $       {         x; s/\n* //g; p; d;   }
 }
' infile

对于从 /^Import-Package: / 到任何非 space 第一个字符的行,

  • 如果该行以 Import-Package: 开头,用它替换保留 space,并将其从模式 space 中删除以触发下一次干净读取。
  • 如果该行以 space 开头,请将其添加到保留 space
  • 如果一行以非 space 开头,则用 s/.*// 擦除它;其余的也适用于最后一行 ($),因此在任何一种情况下,x 都会将累积的保持 space 放回到模式 space 中(技术上它交换它们) ,s/\n* //g 将所有换行符-space 序列替换为空(删除它们),p 打印该行,d 删除它以获得干净的缓冲区以开始下一个循环(它在文件末尾退出。)

The rest of this is an unnecessary alternative,

...但由于我第一次读错了请求,所以我把它留了下来,以防它可能对其他人有帮助。

如果你想把所有的包裹都拆开并打印成一行(我最初以为你是这个意思),那么

$: sed -n '
 $ {
  /^Import-Package: / {
    s/^Import-Package: //; s/,/\n/g; p;
  }
 }
 /^Import-Package: /,/^[^ ]/ {
    /^Import-Package:/ { s/^Import-Package://; h; n; }
    /^ / H;
    /^[^ ]/ { s/.*//; x; s/\n* //g; s/,/\n/g; p; d;   }
    $       { s/.*//; x; s/\n* //g; s/,/\n/g; p; d;   }
 }
' infile

如果 Import-Package: 不可能从文件的最后一行开始,您可以删除顶部的 $ 块。如果它不能成为文件中的最后一个块,您也可以删除主块底部的 $ 行。

c.f。 GNU sed manual 每个命令的细目 - 如果你愿意,我会回来详细说明。

这可能对你有用 (GNU sed):

sed -n '/^Import-Package:/{:a;N;s/\n //;ta;P;D}' file

使用 -n 选项显式打印文本。从以 Import-Package: 开头的第一行开始追加以下行。如果该附加行以 space 开头,请将其删除,如果替换成功,请重复直到附加了不匹配的行。然后打印模式的第一行space,然后删除模式的第一行space并重复。

使用 Perl

perl -0777 -ne ' s/.*(Import-Package:.+?)\n(?=\S)(.*)//smog; print ' sameer.pkg

删除换行符

perl -0777 -ne ' s/.*(Import-Package:.+?)\n(?=\S)(.*)//smog; print ' sameer.pkg | tr -d '\n'

Import-Package: com.advantco.base,com.advantco.base.logging,com.advant co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth. oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter, com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com .advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest. auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml. transform.stream,org.apache.commons.codec.binary,org.apache.commons.c ollections4.map,org.apache.commons.httpclient,org.apache.commons.http client.util,org.json