如何使用 vim 的正则表达式转义 xml 标签中的 '<' 和 '>'？

Question

我正在用 Python 解析一个 XML 文件。

from xml.dom import minidom
xmldoc = minidom.parse('selections.xml')

但是我执行的时候，出现了这样的xml.parsers.expat.ExpatError: not well-formed (invalid token)错误。检查文件后，我发现标签中的 < > 太多了。因此，我想使用正则表达式在 XML 标签中转义 < 和 > 。例如，在文本标签中，我想在 'Winning 11'.

之外转义 < 和 >

<writing>
    <topic id="10">I am a fun</topic>
    <date>2012-03-1</date>   
    <grade>86</grade>
    <text>
          You know he is a soccer fan,so you'd better to buy the game is <Winning 11>!
    </text>
</writing>

我知道 < 和 > 的转义是 < 和 >。由于我的 XML 文件中的标签太多所以我想在 vim.

下使用正则表达式来解决它

谁能给我一些想法？我是正则表达式的新手。

Answer 1

真的不是一个好情况。

但是，如果您知道文件中的有效 xml 标记，则以下内容将仅匹配您要转义的 'bad tags'：

<(?!/?grade|/?text)([^>]+)>

以 |\?tag.

的形式向该列表添加更多有效标签

那你可以用

代替

&lt;&gt;

这里是 regexr。

如果您需要在 vim 中执行此操作，则需要将其转换为 vim 正则表达式，这并不完全相同。

Answer 2

详细：

:%s/    #search and replace on all lines in file
\(      #open  group
<text>  #\n find <text> tag with newline at it's end
.*      #grab all text until next match
\)      #close   group
<       #the `<` mark we're looking for
\(      #open  group
.*\n    #grab all text until end of line
.*      #grab text on the next line
<\/text> #find </text> tag
\)      #close  group
/       #vi replace with
      #paste  group in
\&lt;   #replace `<` with it's escaped version
      #paste  group in
/g      #Do on all occurrences

:%s/\(<text>\n.*\)<\(.*\n.*<\/text>\)/\&lt;/g

第二个和第一个一样，我将 < 替换为 >，将 < 替换为 >

:%s/\(<text>\n.*\)>\(.*\n.*<\/text>\)/\&gt;/g

结合|

:%s/\(<text>\n.*\)<\(.*\n.*<\/text>\)/\&lt;/g | %s/\(<text>\n.*\)>\(.*\n.*<\/text>\)/\&gt;/g

参考：
Capturing Groups and Backreferences

Regex without vim escaping 对于 < 部分，请看第一组直到 < 标记，第二组紧接在

之后

如何使用 vim 的正则表达式转义 xml 标签中的 '<' 和 '>'？

how to escape '<' and '>' in xml tags using regular expression with vim?

python

regex

xml

vim