Selenium 和 BeautifulSoup:将会话数据资源共享和拉取到 python 中的多个库
Selenium and BeautifulSoup: sharing and pulling session data resources to multiple libraries in python
我在比较 Python 3.6 中的两个库时遇到问题。我使用 Selenium Firefox WebDriver 登录网站,但是当我想要 BeautifulSoup 或请求读取该网站时,它读取 link,但不同(读取该页面就好像我没有登录一样) .我如何告诉 Requests 我已经登录了?
下面是我到目前为止写的代码---
from selenium import webdriver
import config
import requests
from bs4 import BeautifulSoup
#choose webdriver
browser=webdriver.Firefox(executable_path="C:\Users\myUser\geckodriver.exe")
browser.get("https://www.mylink.com/")
#log in
timeout = 1
login = browser.find_element_by_name("sf-login")
login.send_keys(config.USERNAME)
password = browser.find_element_by_name("sf-password")
password.send_keys(config.PASSWORD)
button_log = browser.find_element_by_xpath("/html/body/div[2]/div[1]/div/section/div/div[2]/form/p[2]/input")
button_log.click()
name = "https://www.policytracker.com/auctions/page/"
browser.get(name)
name2 = "/html/body/div[2]/div[1]/div/section/div/div[2]/div[3]/div[" + str(N) + "]/a"
#next page loaded
title1 = browser.find_element_by_xpath(name2)
title1.click()
page = browser.current_url -------> this save url from website that i want to download content (i've already logged in that page)
r = requests.get(page) ---------> i want requests to go to this page, he goes, but not included logged in proceder.... WRONG
r.content
soup = BeautifulSoup(r.content, 'lxml')
print (soup)
如果只是想把页面源传给BeautifulSoup
,可以从selenium
获取页面源,然后直接传给BeautifulSoup
(不需要requests
模块)。
而不是
page = browser.current_url
r = requests.get(page)
soup = BeautifulSoup(r.content, 'lxml')
你可以做到
page = browser.page_source
soup = BeautifulSoup(page, 'html.parser')
我在比较 Python 3.6 中的两个库时遇到问题。我使用 Selenium Firefox WebDriver 登录网站,但是当我想要 BeautifulSoup 或请求读取该网站时,它读取 link,但不同(读取该页面就好像我没有登录一样) .我如何告诉 Requests 我已经登录了?
下面是我到目前为止写的代码---
from selenium import webdriver
import config
import requests
from bs4 import BeautifulSoup
#choose webdriver
browser=webdriver.Firefox(executable_path="C:\Users\myUser\geckodriver.exe")
browser.get("https://www.mylink.com/")
#log in
timeout = 1
login = browser.find_element_by_name("sf-login")
login.send_keys(config.USERNAME)
password = browser.find_element_by_name("sf-password")
password.send_keys(config.PASSWORD)
button_log = browser.find_element_by_xpath("/html/body/div[2]/div[1]/div/section/div/div[2]/form/p[2]/input")
button_log.click()
name = "https://www.policytracker.com/auctions/page/"
browser.get(name)
name2 = "/html/body/div[2]/div[1]/div/section/div/div[2]/div[3]/div[" + str(N) + "]/a"
#next page loaded
title1 = browser.find_element_by_xpath(name2)
title1.click()
page = browser.current_url -------> this save url from website that i want to download content (i've already logged in that page)
r = requests.get(page) ---------> i want requests to go to this page, he goes, but not included logged in proceder.... WRONG
r.content
soup = BeautifulSoup(r.content, 'lxml')
print (soup)
如果只是想把页面源传给BeautifulSoup
,可以从selenium
获取页面源,然后直接传给BeautifulSoup
(不需要requests
模块)。
而不是
page = browser.current_url
r = requests.get(page)
soup = BeautifulSoup(r.content, 'lxml')
你可以做到
page = browser.page_source
soup = BeautifulSoup(page, 'html.parser')