如何在不删除特定标签(如 <strong> 或 <em> 的情况下删除漂亮对象中的所有标签?

How can I remove all the tags in a beautiful object without remove specific tags like <strong> or <em>?

鉴于以下 html,我如何删除 BeautifulSoup 中的所有标签,除了文体标签,如 <strong><em>

    <ol class="journal">
    <li>A. Gilad Kusne, Heshan Yu, Changming Wu, Huairuo Zhang, Jason Hattrick-Simpers, Brian 
DeCost, Suchismita Sarker, Corey Oses, Cormac Toher, Stefano Curtarolo, Albert V. Davydov, 
Ritesh Agarwal, Leonid A. Bendersky, Mo Li, Apurva Mehta, Ichiro Takeuchi. <strong>On-the-fly 
closed-loop materials discovery via Bayesian active learning</strong>. <em>Nature Communications</em>, 2020; 11 (1) DOI: <a href="http://dx.doi.org/10.1038/s41467-020-19597-w" rel="nofollow" target="_blank">10.1038/s41467-020-19597-w</a>
    </li>
    </ol>

我知道我可以使用正则表达式来删除特定的标签,但是在 BeautifulSoup 中有什么优雅的方法可以删除一些标签同时排除其他标签吗?

使用soup.descendants:

[node for node in soup.descendants if node.name in ['strong','em']]

试试这个:

import re
from bs4 import BeautifulSoup as bs

html = """<ol class="journal">
    <li>A. Gilad Kusne, Heshan Yu, Changming Wu, Huairuo Zhang, Jason 
Hattrick-Simpers, Brian DeCost, Suchismita Sarker, Corey Oses, Cormac Toher, 
Stefano Curtarolo, Albert V. Davydov, Ritesh Agarwal, Leonid A. Bendersky, 
Mo Li, Apurva Mehta, Ichiro Takeuchi. <strong>On-the-fly closed-loop 
materials discovery via Bayesian active learning</strong>. 
<em>Nature Communications</em>, 2020; 11 (1) DOI: 
<a href="http://dx.doi.org/10.1038/s41467-020-19597-w" rel="nofollow" 
target="_blank">10.1038/s41467-020-19597-w</a>
    </li>
    </ol>"""
soup = bs(html, features='xml')
tags = [tag.name for tag in soup.find_all(True) if tag.name not in ['strong', 'em']]
for tag in tags:
    html = re.sub(f'</?{tag}[^>]*>', '', html)
print(html)

输出:

A. Gilad Kusne, Heshan Yu, Changming Wu, Huairuo Zhang, Jason Hattrick-Simpers, 
Brian DeCost, Suchismita Sarker, Corey Oses, Cormac Toher, Stefano Curtarolo, 
Albert V. Davydov, Ritesh Agarwal, Leonid A. Bendersky, Mo Li, Apurva Mehta, 
Ichiro Takeuchi. <strong>On-the-fly closed-loop materials discovery 
via Bayesian active learning</strong>. <em>Nature Communications</em>, 
2020; 11 (1) DOI: 10.1038/s41467-020-19597-w