Python_Web_scraping Html Table
Python_Web_scraping Html Table
我是 Python 初级开发人员,我仍处于学习阶段。
更具体地说是使用请求和 bs4 进行抓取。
当试图抓取以下 link 时:'http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22'
我使用了以下代码:
import requests
from bs4 import BeautifulSoup
url ="http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
res.close()
results = soup.find('table')
虽然在 Chrome 中检查源页面时存在 table,但结果中没有 table。
请问有什么解决办法或解释吗?
谢谢
Table数据在框架内,你需要先走
import requests
from lxml import html
from bs4 import BeautifulSoup
BASE_URL = "http://directorybtr.az.gov/listings/"
URL = BASE_URL + "FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
#u need session because the frame use the search results data, u cant directly go to Firms.asp
session = requests.session()
response = session.get(URL)
soup = BeautifulSoup(response.text, 'lxml')
#find the first frame
frame = soup.find("frame")
#go to the frame link ( Firms.asp )
response = session.get(BASE_URL + frame.attrs['src'])
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find("table")
print table
response.close()
我是 Python 初级开发人员,我仍处于学习阶段。 更具体地说是使用请求和 bs4 进行抓取。 当试图抓取以下 link 时:'http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22'
我使用了以下代码:
import requests
from bs4 import BeautifulSoup
url ="http://directorybtr.az.gov/listings/FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
res.close()
results = soup.find('table')
虽然在 Chrome 中检查源页面时存在 table,但结果中没有 table。 请问有什么解决办法或解释吗?
谢谢
Table数据在框架内,你需要先走
import requests
from lxml import html
from bs4 import BeautifulSoup
BASE_URL = "http://directorybtr.az.gov/listings/"
URL = BASE_URL + "FirmSearchResults.asp?Zip%20Like%20%22850%25%22"
#u need session because the frame use the search results data, u cant directly go to Firms.asp
session = requests.session()
response = session.get(URL)
soup = BeautifulSoup(response.text, 'lxml')
#find the first frame
frame = soup.find("frame")
#go to the frame link ( Firms.asp )
response = session.get(BASE_URL + frame.attrs['src'])
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find("table")
print table
response.close()