如何使用 python 将小于和大于符号转换为父标记中的实体?
How to use python to convert less than and more than signs to entities within a parent tag?
我很难编写将 <
和 >
转换为 \<
和 \>
父标签内的 <sep>
的功能代码。所以原始代码如下所示:
<xml>
<body>
<month>
<sep>Hello world!<p>This is
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>
结果应该是:
<xml>
<body>
<month>
<sep>Hello world!\<p\>This is
september!\</p\> Hello world!\<b\>And today's Firday!\</b\></sep>
</month>
<month>
<sep>\<i\>This is October!\<i\></sep>
</month>
</body>
</xml>
到目前为止,我的代码是这样的:
text1 = re.findall(r"<sep>((.|\n)*?)<\/sep>", f.read())
text2 = re.sub(r"<(.*?)>", r"\<"+r""+"\>", text1)
但是如何将转换后的文本放回原始文件中呢?
谢谢!
sample = """<xml>
<body>
<month>
<sep>Hello world!<p>This is
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>"""
def encode_text(in_txt):
out_txt = copy.copy(in_txt)
matches = re.findall(r"<sep>((.|\n)*?)<\/sep>", in_txt)
for (txt,_) in matches:
out_txt = out_txt.replace(txt, re.sub(r"<(.*?)>", r"\<"+r""+"\>", txt), 1)
return out_txt
def decode_text(in_txt):
out_txt = copy.copy(in_txt)
matches = re.findall(r"<sep>((.|\n)*?)<\/sep>", in_txt)
for (txt,_) in matches:
out_txt = out_txt.replace(txt, re.sub(r"\\<(.*?)\\>", r"<>", txt), 1)
return out_txt
result_encoded = encode_text(sample)
result_decoded = decode_text(result_encoded)
print(result_encoded)
打印:
<xml>
<body>
<month>
<sep>Hello world!\<p\>This is
september!\</p\> Hello world!\<b\>And today's Firday!\</b\></sep>
</month>
<month>
<sep>\<i\>This is October!\<i\></sep>
</month>
</body>
</xml>
而 print(result_decoded)
打印:
<xml>
<body>
<month>
<sep>Hello world!<p>This is
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>
另外,请注意:
result_decode == sample
Out[87]: True
我很难编写将 <
和 >
转换为 \<
和 \>
父标签内的 <sep>
的功能代码。所以原始代码如下所示:
<xml>
<body>
<month>
<sep>Hello world!<p>This is
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>
结果应该是:
<xml>
<body>
<month>
<sep>Hello world!\<p\>This is
september!\</p\> Hello world!\<b\>And today's Firday!\</b\></sep>
</month>
<month>
<sep>\<i\>This is October!\<i\></sep>
</month>
</body>
</xml>
到目前为止,我的代码是这样的:
text1 = re.findall(r"<sep>((.|\n)*?)<\/sep>", f.read())
text2 = re.sub(r"<(.*?)>", r"\<"+r""+"\>", text1)
但是如何将转换后的文本放回原始文件中呢? 谢谢!
sample = """<xml>
<body>
<month>
<sep>Hello world!<p>This is
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>"""
def encode_text(in_txt):
out_txt = copy.copy(in_txt)
matches = re.findall(r"<sep>((.|\n)*?)<\/sep>", in_txt)
for (txt,_) in matches:
out_txt = out_txt.replace(txt, re.sub(r"<(.*?)>", r"\<"+r""+"\>", txt), 1)
return out_txt
def decode_text(in_txt):
out_txt = copy.copy(in_txt)
matches = re.findall(r"<sep>((.|\n)*?)<\/sep>", in_txt)
for (txt,_) in matches:
out_txt = out_txt.replace(txt, re.sub(r"\\<(.*?)\\>", r"<>", txt), 1)
return out_txt
result_encoded = encode_text(sample)
result_decoded = decode_text(result_encoded)
print(result_encoded)
打印:
<xml>
<body>
<month>
<sep>Hello world!\<p\>This is
september!\</p\> Hello world!\<b\>And today's Firday!\</b\></sep>
</month>
<month>
<sep>\<i\>This is October!\<i\></sep>
</month>
</body>
</xml>
而 print(result_decoded)
打印:
<xml>
<body>
<month>
<sep>Hello world!<p>This is
september!</p> Hello world!<b>And today's Firday!</b></sep>
</month>
<month>
<sep><i>This is October!<i></sep>
</month>
</body>
</xml>
另外,请注意:
result_decode == sample
Out[87]: True