Selenium 和 BeautifulSoup：将会话数据资源共享和拉取到 python 中的多个库

Question

我在比较 Python 3.6 中的两个库时遇到问题。我使用 Selenium Firefox WebDriver 登录网站，但是当我想要 BeautifulSoup 或请求读取该网站时，它读取 link，但不同（读取该页面就好像我没有登录一样） .我如何告诉 Requests 我已经登录了？

下面是我到目前为止写的代码---

from selenium import webdriver
import config
import requests
from bs4 import BeautifulSoup

#choose webdriver
browser=webdriver.Firefox(executable_path="C:\Users\myUser\geckodriver.exe")
browser.get("https://www.mylink.com/")

#log in
timeout = 1
login = browser.find_element_by_name("sf-login")
login.send_keys(config.USERNAME)

password = browser.find_element_by_name("sf-password")
password.send_keys(config.PASSWORD)

button_log = browser.find_element_by_xpath("/html/body/div[2]/div[1]/div/section/div/div[2]/form/p[2]/input")
button_log.click()

name = "https://www.policytracker.com/auctions/page/"
browser.get(name)

name2 = "/html/body/div[2]/div[1]/div/section/div/div[2]/div[3]/div[" + str(N) + "]/a"

#next page loaded
title1 = browser.find_element_by_xpath(name2)
title1.click()
page = browser.current_url -------> this save url from website that i want to download content (i've already logged in that page)
r = requests.get(page) ---------> i want requests to go to this page, he goes, but not included logged in proceder.... WRONG
r.content
soup = BeautifulSoup(r.content, 'lxml')
print (soup)

Answer 1

如果只是想把页面源传给BeautifulSoup，可以从selenium获取页面源，然后直接传给BeautifulSoup（不需要requests 模块）。

而不是

page = browser.current_url
r = requests.get(page)
soup = BeautifulSoup(r.content, 'lxml')

你可以做到

page = browser.page_source
soup = BeautifulSoup(page, 'html.parser')

Selenium 和 BeautifulSoup：将会话数据资源共享和拉取到 python 中的多个库

Selenium and BeautifulSoup: sharing and pulling session data resources to multiple libraries in python

python

selenium

beautifulsoup

session-cookies

python-requests