如何从 Beautifulsoup 中删除结果的 HTML 标签 find all

Question

我需要使用 python 和 beautifulsoup.

删除标签并仅保留以下代码输出中的文本

输出：

import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content)
print(soup.prettify())


first_header = soup.find(["h2", "h2"])

first_headers = soup.find_all(["h2", "h2"])
first_headers

Answer 1

要仅从 ResultSet 中获取文本，请对其进行迭代，例如使用 list comprehension，为每个元素调用 .text，并通过 whitespace:

为所有文本元素调用 .join()

' '.join([e.text for e in soup.find_all('h2')])

例子

import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content)


first_headers = ' '.join([e.text for e in soup.find_all('h2')])

print(first_headers)

输出

Tutorials References Exercises and Quizzes HTML Tutorial HTML Forms HTML Graphics HTML Media HTML APIs HTML Examples HTML References What is HTML? A Simple HTML Document What is an HTML Element? Web Browsers HTML Page Structure HTML History Report Error Thank You For Helping Us!

Answer 2

import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content,features="html.parser") # getting content from webpage
# retriving all h1 and h2 tags and extracting text from each of them 
first_headers = [html.text for html in soup.find_all(["h1", "h2"])] 
print(first_headers)

我使用列表理解在一行中解决了它，你可以使用 for 循环来代替

import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content,features="html.parser")

first_headers = soup.find_all(["h1", "h2"])
for i in first_headers:
    print(i.text)

这是我的代码的输出：

Tutorials
References
Exercises and Quizzes
HTML Tutorial
HTML Forms
HTML Graphics
HTML Media
HTML APIs
HTML Examples
HTML References
HTML Introduction
What is HTML?
A Simple HTML Document
What is an HTML Element?
Web Browsers
HTML Page Structure
HTML History
Report Error
Thank You For Helping Us!

如何从 Beautifulsoup 中删除结果的 HTML 标签 find all

How to remove HTML tags of a result from Beatifulsoup find all

python

beautifulsoup

例子

输出