身份验证结果为 404 代码
Authentication results in 404 code
我需要抓取一个网站,但在此之前我需要先登录。
我似乎需要输入三样东西,用户名、密码和真实性令牌。我知道用户名和密码,但我不确定如何访问令牌。
这是我试过的:
import requests
from lxml import html
login_url = "https://urs.earthdata.nasa.gov/home"
session_requests = requests.session()
result = session_requests.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='authenticity_token']/@value")))[0]
payload = {"username": "my_name",
"password": "my_password",
"authenticity_token": authenticity_token}
result = session_requests.post(
login_url,
data = payload,
headers = dict(referer=login_url)
)
print (result)
这导致:
<Response [404]>
我的姓名和密码输入正确,所以一定是令牌出错了。我认为问题出在这一行:
authenticity_token = list(set(tree.xpath("//input[@name='authenticity_token']/@value")))[0]
或这一行:
payload = {"username": "my_name",
"password": "my_password",
"authenticity_token": authenticity_token}
通过查看网页上的源代码,我注意到有一个 authenticity_token
、csrf-token
和一个 csrf-param
。所以可能这些顺序错误,但我尝试了所有组合。
编辑:
这是一个漂亮的汤方法,再次导致 404。
s = requests.session()
response = s.get(login_url)
soup = BeautifulSoup(response.text, "lxml")
for n in soup('input'):
if n['name'] == 'authenticity_token':
token = n['value']
if n['name'] == 'utf8':
utf8 = n['value']
break
auth = {
'username': 'my_username'
, 'password': 'my_password'
, 'authenticity_token': token
, 'utf8' : utf8
}
s.post(login_url, data=auth)
如果您检查该页面,您会注意到表单操作值是 '/login'
,因此您必须将数据提交到 https://urs.earthdata.nasa.gov/login'
。
login_url = "https://urs.earthdata.nasa.gov/login"
home_url = "https://urs.earthdata.nasa.gov/home"
s = requests.session()
soup = BeautifulSoup(s.get(home_url).text, "lxml")
data = {i['name']:i.get('value', '') for i in soup.find_all('input')}
data['username'] = 'my_username'
data['password'] = 'my_password'
result = s.post(login_url, data=data)
print(result)
< Response [200]>
selenium
的简单示例:
from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
driver.get(url)
driver.find_element_by_name('username').send_keys('my_username')
driver.find_element_by_name('password').send_keys('my_password')
driver.find_element_by_id('login').submit()
html = driver.page_source
driver.quit()
我需要抓取一个网站,但在此之前我需要先登录。
我似乎需要输入三样东西,用户名、密码和真实性令牌。我知道用户名和密码,但我不确定如何访问令牌。
这是我试过的:
import requests
from lxml import html
login_url = "https://urs.earthdata.nasa.gov/home"
session_requests = requests.session()
result = session_requests.get(login_url)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='authenticity_token']/@value")))[0]
payload = {"username": "my_name",
"password": "my_password",
"authenticity_token": authenticity_token}
result = session_requests.post(
login_url,
data = payload,
headers = dict(referer=login_url)
)
print (result)
这导致:
<Response [404]>
我的姓名和密码输入正确,所以一定是令牌出错了。我认为问题出在这一行:
authenticity_token = list(set(tree.xpath("//input[@name='authenticity_token']/@value")))[0]
或这一行:
payload = {"username": "my_name",
"password": "my_password",
"authenticity_token": authenticity_token}
通过查看网页上的源代码,我注意到有一个 authenticity_token
、csrf-token
和一个 csrf-param
。所以可能这些顺序错误,但我尝试了所有组合。
编辑:
这是一个漂亮的汤方法,再次导致 404。
s = requests.session()
response = s.get(login_url)
soup = BeautifulSoup(response.text, "lxml")
for n in soup('input'):
if n['name'] == 'authenticity_token':
token = n['value']
if n['name'] == 'utf8':
utf8 = n['value']
break
auth = {
'username': 'my_username'
, 'password': 'my_password'
, 'authenticity_token': token
, 'utf8' : utf8
}
s.post(login_url, data=auth)
如果您检查该页面,您会注意到表单操作值是 '/login'
,因此您必须将数据提交到 https://urs.earthdata.nasa.gov/login'
。
login_url = "https://urs.earthdata.nasa.gov/login"
home_url = "https://urs.earthdata.nasa.gov/home"
s = requests.session()
soup = BeautifulSoup(s.get(home_url).text, "lxml")
data = {i['name']:i.get('value', '') for i in soup.find_all('input')}
data['username'] = 'my_username'
data['password'] = 'my_password'
result = s.post(login_url, data=data)
print(result)
< Response [200]>
selenium
的简单示例:
from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
driver.get(url)
driver.find_element_by_name('username').send_keys('my_username')
driver.find_element_by_name('password').send_keys('my_password')
driver.find_element_by_id('login').submit()
html = driver.page_source
driver.quit()