如何使用 beautiful soup 4 从 span 标签中提取文本?
How can I extract text from a span tag using beautiful soup 4?
如何使用 beautful soup 通过 span 标签抓取文本?
scrape faculty members informations
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.uoj.ac.ae/ContentBan.aspx?m=15&p=4&sm=4")
soup = BeautifulSoup(r.content, 'html5lib')
for tag in soup.find_all('table'):
if tag.has_attr("class"):
if tag['class'] == 'MsoTableGrid':
for tag1 in soup.findAll('span'):
print tag1.text
我想打印 span 标签内的文本,但我得到的输出是:
Process finished with exit code 0
如果你想从所有跨度文本中提取而不考虑 class 名称,那么试试这个:-
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.uoj.ac.ae/ContentBan.aspx?m=15&p=4&sm=4")
soup = BeautifulSoup(r.content, 'lxml')
span_text = soup.findAll('span')
for s in span_text:
print(s.text)
您可以使用 CSS 选择器找到 table
的 tr
个元素,其中 class MsoTableGrid
,并且然后从行的列中获取所需的信息,例如教员姓名和电子邮件地址,例如:
>>> rows = soup.select("table.MsoTableGrid tr")
>>> for r in rows:
... faculty_info = r.find_all("td")[1:3]
... if len(faculty_info) == 2:
... print faculty_info[0].text.strip(), faculty_info[1].text.strip()
...
Name E-mail
Dr. Hassan Ali Dabouq dr.hassandbouk@uoj.ac.ae
Prof.dr.Magdie Medhat Elnahry magdielnahry@uoj.ac.ae
Dr. Abd Elwahaab Mohamed Khalil abdelwahab@uoj.ac.ae
Dr. Ahmed Hassan Fouly Dr.ahmedfoly@uoj.ac.ae
Dr. Walid Mohamed Abbas walidabas@uoj.ac.ae
Dr. Wael Mahmoud Fakhry wfakhry@uoj.ac.ae
Dr. Kamel Abd Elaziz Ali kamelali@uoj.ac.ae
.
.
.
如何使用 beautful soup 通过 span 标签抓取文本? scrape faculty members informations
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.uoj.ac.ae/ContentBan.aspx?m=15&p=4&sm=4")
soup = BeautifulSoup(r.content, 'html5lib')
for tag in soup.find_all('table'):
if tag.has_attr("class"):
if tag['class'] == 'MsoTableGrid':
for tag1 in soup.findAll('span'):
print tag1.text
我想打印 span 标签内的文本,但我得到的输出是:
Process finished with exit code 0
如果你想从所有跨度文本中提取而不考虑 class 名称,那么试试这个:-
from bs4 import BeautifulSoup
import requests
r = requests.get("http://www.uoj.ac.ae/ContentBan.aspx?m=15&p=4&sm=4")
soup = BeautifulSoup(r.content, 'lxml')
span_text = soup.findAll('span')
for s in span_text:
print(s.text)
您可以使用 CSS 选择器找到 table
的 tr
个元素,其中 class MsoTableGrid
,并且然后从行的列中获取所需的信息,例如教员姓名和电子邮件地址,例如:
>>> rows = soup.select("table.MsoTableGrid tr")
>>> for r in rows:
... faculty_info = r.find_all("td")[1:3]
... if len(faculty_info) == 2:
... print faculty_info[0].text.strip(), faculty_info[1].text.strip()
...
Name E-mail
Dr. Hassan Ali Dabouq dr.hassandbouk@uoj.ac.ae
Prof.dr.Magdie Medhat Elnahry magdielnahry@uoj.ac.ae
Dr. Abd Elwahaab Mohamed Khalil abdelwahab@uoj.ac.ae
Dr. Ahmed Hassan Fouly Dr.ahmedfoly@uoj.ac.ae
Dr. Walid Mohamed Abbas walidabas@uoj.ac.ae
Dr. Wael Mahmoud Fakhry wfakhry@uoj.ac.ae
Dr. Kamel Abd Elaziz Ali kamelali@uoj.ac.ae
.
.
.