如何使用 python3 请求登录和网络抓取 "support.oracle.com"?

How to login and web scrape "support.oracle.com" using python3 requests?

我正在尝试使用 python 请求进行下面提到的 URL 的网络抓取,但无法成功。

Url: https://support.oracle.com/rs?type=doc&id=1439822.1

无效代码:

import requests
from bs4 import BeautifulSoup  

s = requests.session()
s.headers.update(headers)


r = s.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('user@email.com', 'mypass'), allow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())

预期输出:(通过网络浏览器获得输出,post成功登录。实际上需要在命令行上输出以下内容)

当前输出:(再次显示登录页面)

注意:可以通过wget命令实现,但我需要用python请求。

wget --user "user@email.com" --password "mypass" "https://support.oracle.com/rs?type=doc&id=1439822.1" -O /root/webout.html

感谢您的帮助!!

终于找到答案了!!

import requests
from bs4 import BeautifulSoup

r = requests.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('user@email.com', 'mypass'), allow_redirects=True)

full_fetch = requests.get(r.url, auth=('user@email.com', 'mypass), allow_redirects=True) 
soup = BeautifulSoup(full_fetch.text, 'html.parser')
print(soup.prettify())