仅当字符串包含匹配项时才替换字符串周围的标签

Question

我有一个文件，其中有很多行包含被标签包围的字符串。

  <tag:identifier>99454</tag:identifier>
  <tag:identifier>97817(web)</tag:identifier>
  <tag:identifier>http://www.google.com</tag:identifier>
  <tag:title>Title String/</tag:title>
  <tag:creator>Example</tag:creator>
  <tag:creator>Field</tag:creator>
  <tag:creator>Country</tag:creator>

我正在尝试找到一种方法来更改每个 URL 周围的标签。它们都以 <tag:identifier>http 开头，因此查找包含 URL 的行不是问题，我只是不知道如何替换结束标记。例如，<tag:url>http://www.google.com</tag:url>

我可以使用什么工具来执行此操作？

Answer 1

你可以试试这个sed

sed -E '/http/ {s/identifier/url/g}' $file

这将匹配任何包含 http 的行，然后将 identifier 替换为 url

你也可以用这个awk

awk -F"[<>]" '~/http/{="<tag:url>"; ="</tag:url>"}1' $file

这里，我们将分隔符设置为<或>，并替换第2列和第4列的值

输出

  <tag:identifier>99454</tag:identifier>
  <tag:identifier>97817(web)</tag:identifier>
  <tag:url>http://www.google.com</tag:url>
  <tag:title>Title String/</tag:title>
  <tag:creator>Example</tag:creator>
  <tag:creator>Field</tag:creator>
  <tag:creator>Country</tag:creator>

Answer 2

当您可能有一个 url 像 http://www.identifier.com 时，您可以匹配该行的每个部分。

sed -r 's#<(tag:identifier)>(.*)</>#<tag:url></tag:url>#' file

仅当字符串包含匹配项时才替换字符串周围的标签

Replace tags surrounding string only if string contains match

regex

replace

sed