如何使用 python 将小于和大于符号转换为父标记中的实体？

Question

我很难编写将 < 和 > 转换为 \< 和 \> 父标签内的 <sep> 的功能代码。所以原始代码如下所示：

<xml>
<body>
<month>
<sep>Hello world!<p>This is 
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>

结果应该是：

<xml>
<body>
<month>
<sep>Hello world!\&lt;p\&gt;This is 
september!\&lt;/p\&gt; Hello world!\&lt;b\&gt;And today's Firday!\&lt;/b\&gt;</sep>
</month>
<month>
<sep>\&lt;i\&gt;This is October!\&lt;i\&gt;</sep>
</month>
</body>
</xml>

到目前为止，我的代码是这样的：

text1 = re.findall(r"<sep>((.|\n)*?)<\/sep>", f.read())
text2 = re.sub(r"<(.*?)>", r"\&lt;"+r""+"\&gt;", text1)

但是如何将转换后的文本放回原始文件中呢？谢谢！

Answer 1

sample = """<xml>
<body>
<month>
<sep>Hello world!<p>This is 
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>"""

def encode_text(in_txt):
  out_txt = copy.copy(in_txt)
  matches = re.findall(r"<sep>((.|\n)*?)<\/sep>", in_txt)
  for (txt,_) in matches:
    out_txt = out_txt.replace(txt, re.sub(r"<(.*?)>", r"\&lt;"+r""+"\&gt;", txt), 1)
  return out_txt

def decode_text(in_txt):
  out_txt = copy.copy(in_txt)
  matches = re.findall(r"<sep>((.|\n)*?)<\/sep>", in_txt)
  for (txt,_) in matches:
    out_txt = out_txt.replace(txt, re.sub(r"\\&lt;(.*?)\\&gt;", r"<>", txt), 1)
  return out_txt

result_encoded = encode_text(sample)
result_decoded = decode_text(result_encoded)

print(result_encoded) 打印：

<xml>
<body>
<month>
<sep>Hello world!\&lt;p\&gt;This is 
september!\&lt;/p\&gt; Hello world!\&lt;b\&gt;And today's Firday!\&lt;/b\&gt;</sep>
</month>
<month>
<sep>\&lt;i\&gt;This is October!\&lt;i\&gt;</sep>
</month>
</body>
</xml>

而 print(result_decoded) 打印：

<xml>
<body>
<month>
<sep>Hello world!<p>This is 
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>

另外，请注意：

result_decode == sample
Out[87]: True

如何使用 python 将小于和大于符号转换为父标记中的实体？

How to use python to convert less than and more than signs to entities within a parent tag?

python

regex

entity