使用 beautifulsoup 拆分 html 代码以获得所需的格式
To split html code using beautifulsoup for the required format
我有一个 HTML 片段,如下所示:
<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>
如何在 Beautiful Soup 中解析它以获得:
Abc: test1, Def: test2
这是我迄今为止尝试过的方法:
data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)
link = temp.select('.myTestCode')
#both didn't print the expected output as mentioned above
print str(link).split('<strong>')
print ''.join(link.stripped_strings)
一种可能的方法:
from bs4 import BeautifulSoup
data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)
#get individual <strong> elements
strongs = temp.select('.myTestCode > strong')
#map each <strong> element to it's text content concatenated with the text node that follow
result = map(lambda x: x.text + x.nextSibling.strip(), strongs)
#join all separated by comma and print
print ', '.join(result)
#print output:
#Abc: test1, Def: test2
我有一个 HTML 片段,如下所示:
<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>
如何在 Beautiful Soup 中解析它以获得:
Abc: test1, Def: test2
这是我迄今为止尝试过的方法:
data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)
link = temp.select('.myTestCode')
#both didn't print the expected output as mentioned above
print str(link).split('<strong>')
print ''.join(link.stripped_strings)
一种可能的方法:
from bs4 import BeautifulSoup
data = """<div class="myTestCode">
<strong>Abc: </strong> test1</br>
<strong>Def: </strong> test2</br>
</div>"""
temp = BeautifulSoup(data)
#get individual <strong> elements
strongs = temp.select('.myTestCode > strong')
#map each <strong> element to it's text content concatenated with the text node that follow
result = map(lambda x: x.text + x.nextSibling.strip(), strongs)
#join all separated by comma and print
print ', '.join(result)
#print output:
#Abc: test1, Def: test2