Select 带有 beautifulsoup 点的标签
Select tag having a dot with beautifulsoup
如何使用 beautifulsoup select 并用其他文本修改标签 <Tagwith.dot>
?如果 beautifulsoup 不可能,那么 xml 文档编辑和创建的下一个最佳库是 lxml?
from bs4 import BeautifulSoup as bs
stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
soup = bs(stra)
想要XML:
<body>
<Tagwith.dot>Edited text</Tagwith.dot>
</body>
您可以使用xml.etree.elementtree实现您想要的效果如下
import xml.etree.ElementTree as ET
stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
#Read xml string and convert to xml object
xml_obj = ET.fromstring(stra)
#Iterate through elements
for elem in xml_obj:
#If tag is found, modify the text
if elem.tag == 'Tagwith.dot':
elem.text = 'Edited text'
#Print updated xml object as a string
print(ET.tostring(xml_obj).decode())
输出将是
<body>
<Tagwith.dot>Edited text</Tagwith.dot>
</body>
BS4 假定并将所有标签转换为小写。下面的代码工作正常。提供小写的标签名称。
from bs4 import BeautifulSoup as bs
stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
soup = bs(stra, 'html.parser')
print(soup.find_all('tagwith.dot'))
输出:
[<tagwith.dot>Text inside tag with dot</tagwith.dot>]
如何使用 beautifulsoup select 并用其他文本修改标签 <Tagwith.dot>
?如果 beautifulsoup 不可能,那么 xml 文档编辑和创建的下一个最佳库是 lxml?
from bs4 import BeautifulSoup as bs
stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
soup = bs(stra)
想要XML:
<body>
<Tagwith.dot>Edited text</Tagwith.dot>
</body>
您可以使用xml.etree.elementtree实现您想要的效果如下
import xml.etree.ElementTree as ET
stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
#Read xml string and convert to xml object
xml_obj = ET.fromstring(stra)
#Iterate through elements
for elem in xml_obj:
#If tag is found, modify the text
if elem.tag == 'Tagwith.dot':
elem.text = 'Edited text'
#Print updated xml object as a string
print(ET.tostring(xml_obj).decode())
输出将是
<body>
<Tagwith.dot>Edited text</Tagwith.dot>
</body>
BS4 假定并将所有标签转换为小写。下面的代码工作正常。提供小写的标签名称。
from bs4 import BeautifulSoup as bs
stra = """
<body>
<Tagwith.dot>Text inside tag with dot</Tagwith.dot>
</body>"""
soup = bs(stra, 'html.parser')
print(soup.find_all('tagwith.dot'))
输出:
[<tagwith.dot>Text inside tag with dot</tagwith.dot>]