Scrapy 不填写表格
Scrapy not filling form
我正在尝试让 Scrapy 使用 FormRequest.from_response 填写以下 HTML 表格:
<form class="form-horizontal" method="POST" role="form">
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Username </label>
<div class="col-sm-9">
<input class="form-control" value="" maxlength="32" name="pun" />
</div>
</div>
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Passphrase </label>
<div class="col-sm-9">
<input class="form-control" type="password" value="" maxlength="10000" name="ak" />
</div>
</div>
</form>
</div>
<div align="right">
<input id="send" type="submit" value="Login" name="login" />
</div>
我按照教程 here 进行操作,但是其中包含字段 "ak" 和 "pun" 的代码无法正常工作。有什么想法或建议吗?谢谢。
编辑:这是我到目前为止得到的
class TestSpider(CrawlSpider):
name = "test1"
allowed_domains = ['...']
start_urls = [
'...'
]
rules = {Rule(LinkExtractor(), callback='parse_items', follow=True),}
def parse_items(self, response):
return [FormRequest.from_response(response,
formdata={"pun": '...', "ak": '...'},
callback=self.after_login)]
def after_login(self, link):
# Check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
# Crawl contents ...
submit
按钮必须在 <form>
标签中
试试这个
<form class="form-horizontal" method="POST" role="form">
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Username </label>
<div class="col-sm-9">
<input class="form-control" value="" maxlength="32" name="pun" />
</div>
</div>
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Passphrase </label>
<div class="col-sm-9">
<input class="form-control" type="password" value="" maxlength="10000" name="ak" />
</div>
</div>
<div align="right">
<input id="send" type="submit" value="Login" name="login" />
</div>
</form>
我解决了这个问题。所需要的只是写:
formdata={"pun": '...', "ak": '...', "Login" = 'login'}
不过我对背后的原因仍然存疑。谁能解释一下?
我正在尝试让 Scrapy 使用 FormRequest.from_response 填写以下 HTML 表格:
<form class="form-horizontal" method="POST" role="form">
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Username </label>
<div class="col-sm-9">
<input class="form-control" value="" maxlength="32" name="pun" />
</div>
</div>
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Passphrase </label>
<div class="col-sm-9">
<input class="form-control" type="password" value="" maxlength="10000" name="ak" />
</div>
</div>
</form>
</div>
<div align="right">
<input id="send" type="submit" value="Login" name="login" />
</div>
我按照教程 here 进行操作,但是其中包含字段 "ak" 和 "pun" 的代码无法正常工作。有什么想法或建议吗?谢谢。 编辑:这是我到目前为止得到的
class TestSpider(CrawlSpider):
name = "test1"
allowed_domains = ['...']
start_urls = [
'...'
]
rules = {Rule(LinkExtractor(), callback='parse_items', follow=True),}
def parse_items(self, response):
return [FormRequest.from_response(response,
formdata={"pun": '...', "ak": '...'},
callback=self.after_login)]
def after_login(self, link):
# Check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
# Crawl contents ...
submit
按钮必须在 <form>
标签中
试试这个
<form class="form-horizontal" method="POST" role="form">
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Username </label>
<div class="col-sm-9">
<input class="form-control" value="" maxlength="32" name="pun" />
</div>
</div>
<div class="form-group">
<label class="col-sm-3 control-label" for="inputEmail3"> Passphrase </label>
<div class="col-sm-9">
<input class="form-control" type="password" value="" maxlength="10000" name="ak" />
</div>
</div>
<div align="right">
<input id="send" type="submit" value="Login" name="login" />
</div>
</form>
我解决了这个问题。所需要的只是写:
formdata={"pun": '...', "ak": '...', "Login" = 'login'}
不过我对背后的原因仍然存疑。谁能解释一下?