使用 beautifulsoup 提取段落标记开头和换行符之间的文本

Question

我有以下 HTML 文档

<p>
  "Year: 1932"
   <br>
   <br>
  "Total Share : 0.5 Lakhs (Pure Estimate)"
  <br>
  <br>
  "Verdict"
</p>

我目前正在使用 BeautifulSoup 获取 HTML 中的其他元素，但我无法按原样获取这些行。我把它们排成一行。

Answer 1

尝试关闭 br <br/>

Answer 2

这样试试

from bs4 import BeautifulSoup

response_data = <Your html tags>

soup_data = BeautifulSoup(response_data, features="html5lib")
string_data = soup_data.find('p').text.strip().replace("\n", ",").replace("\"", "").split(',')
data_list=[]
for strng in string_data:
    if strng.strip():
        data_list.append(strng.strip())

print(data_list)

使用 beautifulsoup 提取段落标记开头和换行符之间的文本

Using beautifulsoup to extract text between the start of paragraph tag and a line break

html

beautifulsoup

html-parsing

python-3.x