如何在字符串 html 和 python 中逐行添加 "\n"?

How to add "\n" line by line in string html with python?

我有一行 html 格式如下:

<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>

HTML 代码将具有属性 class 且未指定 ID。我需要在关闭时逐行添加"\n" HTML 代码。 我在 Python 中的代码是这样的:

TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']

html = '''<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>'''

for tag in TAGS:
    html = html.replace('</{}>'.format(tag), '</{}>\n'.format(tag))

for tag in SINGLE_LINE_TAGS:
    html = html.replace('<{}>'.format(tag), '<{}>\n'.format(tag))
    html = html.replace('</{}>'.format(tag), '</{}>\n'.format(tag))

html = html.replace(' />', ' />\n')

print(html)

但结果是:

<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>

为什么不是这样:

<ol class="X5LH0c">
<li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li>
<li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li>
<li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li>
<li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li>
<li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li>
</ol>

我不使用正则表达式。谁能帮我修复代码?感谢您的支持!

快速修复如下。

TAGS = ['p', 'h1', 'h2', 'h3', 'h4', 'li', 'img','ol']
SINGLE_LINE_TAGS = ['ul', 'ol']
INLINE_TAGS = ['strong', 'i', 'u', 'em']

html = '''<ol class="X5LH0c"><li class="TrT0Xe" id="hello">Create A Bakery Business Plan. ... </li><li class="TrT0Xe" id="hello">Choose A Location For Your Bakery Business. ... </li><li class="TrT0Xe">Get All Licenses Required To Open A Bakery Business In India. ... </li><li class="TrT0Xe">Get Manpower Required To Open A Bakery. ... </li><li class="TrT0Xe">Buy Equipment Needed To Start A Bakery Business.</li></ol>'''

# Xuống dòng mỗi thẻ
for tag in TAGS:
    html = html.replace('<{}'.format(tag), '\n<{}'.format(tag))

for tag in SINGLE_LINE_TAGS:
    html = html.replace('<{}>'.format(tag), '\n<{}>'.format(tag))
    html = html.replace('</{}>'.format(tag), '\n</{}>'.format(tag))

print(html)

因此,我已将右括号 </{}> 的搜索替换为左括号 <{} 的搜索,并将换行符的位置切换到左括号之前(因此<li class ... 也被替换了)。

这在 html 文件的开头产生了一个额外的行,您可以使用 html = html[1:].

删除它

一个更优雅的解决方案是使用正则表达式替换,但这完全取决于您想要的确切输出。