FormRequest returns 尽管表单在响应中看起来正确填写,但没有结果
FormRequest returns no results despite the form looking correctly filled out in the response
我正试图在哈利法克斯银行分行的官方分行查找器网站 (https://www.halifax.co.uk/branchfinder/search.asp) 上搜索营业时间,但我一定是在 FormRequest.from_response() 调用中做错了什么,因为对 FormRequest 的响应似乎与原始响应没有变化。
传入 'postcode' 值 'EH',应该有一个结果(Edinburgh, 131 Princes Street, EH2 4AH),就像你在网站上做同样的事情时看到的那样,但是当我使用 FormRequest
执行此操作时,我什么也没得到
这是网站来源的相关部分:
<form action="" id="branch-finder-search-form" method="post">
<div style="display:none;" class="notice error" aria-live="assertive"></div>
<div class="field split-2 clearfix">
<div class="split radio">
<div class="field-radio">
<input value="branch" name="searchType" checked id="r1" type="radio"><label for="r1"><span></span>Branch</label>
</div>
<div class="field-radio">
<input value="cash" name="searchType" id="r2" type="radio"><label for="r2"><span></span>Cash Machine</label>
</div>
</div>
</div>
<div class="field split-3 clearfix">
<div class="split">
<label for="street">Street</label><input name="street" id="street" type="text" value="">
</div>
<div class="split">
<label for="town">Town</label><input name="town" id="town" type="text" value="">
</div>
<div class="split last">
<label for="postcode">Post Code</label><input name="postcode" id="postcode" type="text" value="">
</div>
</div>
<div class="field clearfix">
<div class="split btn-submit">
<input id="search" name="search" alt="Search" type="submit" value="Search" class="button button-primary" title="search"><span class="a_hide">ext search</span></input>
</div>
<noscript>
<input value="yes" name="javascriptoff" id="javascriptoff" type="hidden">
<div class="split btn-submit-nonjs">
<input name="nonjsSubmit" type="submit" alt="Search" value="Search" class="button button-primary" title="search"><span class="a_hide">ext search</span></input>
</div>
<div></div>
</noscript>
<div style="display:none" id="no-result">
<p>No Branch Found as per your search criteria</p>
</div>
<div id="branch-finder-results-container">
<hr>
</div>
</div>
</form>
到目前为止,这是我的代码:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
import scrapy
class HalifaxSpider(scrapy.Spider):
name = "halifax"
start_urls = [
"https://www.halifax.co.uk/branchfinder/search.asp"
]
def parse(self, response):
print(response.text)
yield FormRequest.from_response(response, formname='branch-finder-search-form', formdata={"postcode": "EH"}, callback=self.open_in_browser)
def open_in_browser(self, response):
print(response.text)
open_in_browser(response)
crawler = CrawlerProcess()
crawler.crawl(HalifaxSpider)
crawler.start()
我也试过传递参数 clickdata={"name": "search"},以防搜索按钮没有被点击,但我得到了相同的结果。
总的来说,我对网络抓取真的很陌生,所以我真的被困住了,不知道从这里去哪里。我看过一些文档,但是关于填写各种不同类型的表格的信息或教程并不多,所以我很茫然。
有人可以帮忙吗?
查看以下获取所需数据的方法。 Headers 在这里起着重要的作用,因此请务必包含它(我已经添加了)。
import scrapy
import urllib
from scrapy.crawler import CrawlerProcess
class HalifaxSpider(scrapy.Spider):
name = "halifax"
url = "https://www.halifax.co.uk/asp_includes/branch-finder/branch-finder.asp?"
def start_requests(self):
formdata = {
'street': '',
'town': '',
'postcode': 'EH',
'searchType': 'branch'
}
req_url = f'{self.url}{urllib.parse.urlencode(formdata)}'
yield scrapy.Request(req_url,callback=self.parse)
def parse(self,response):
print(response.css("h2#resultTitle1 ::text").getall())
if __name__ == "__main__":
crawler = CrawlerProcess({'USER_AGENT':'Mozilla/5.0'})
crawler.crawl(HalifaxSpider)
crawler.start()
我正试图在哈利法克斯银行分行的官方分行查找器网站 (https://www.halifax.co.uk/branchfinder/search.asp) 上搜索营业时间,但我一定是在 FormRequest.from_response() 调用中做错了什么,因为对 FormRequest 的响应似乎与原始响应没有变化。
传入 'postcode' 值 'EH',应该有一个结果(Edinburgh, 131 Princes Street, EH2 4AH),就像你在网站上做同样的事情时看到的那样,但是当我使用 FormRequest
执行此操作时,我什么也没得到这是网站来源的相关部分:
<form action="" id="branch-finder-search-form" method="post">
<div style="display:none;" class="notice error" aria-live="assertive"></div>
<div class="field split-2 clearfix">
<div class="split radio">
<div class="field-radio">
<input value="branch" name="searchType" checked id="r1" type="radio"><label for="r1"><span></span>Branch</label>
</div>
<div class="field-radio">
<input value="cash" name="searchType" id="r2" type="radio"><label for="r2"><span></span>Cash Machine</label>
</div>
</div>
</div>
<div class="field split-3 clearfix">
<div class="split">
<label for="street">Street</label><input name="street" id="street" type="text" value="">
</div>
<div class="split">
<label for="town">Town</label><input name="town" id="town" type="text" value="">
</div>
<div class="split last">
<label for="postcode">Post Code</label><input name="postcode" id="postcode" type="text" value="">
</div>
</div>
<div class="field clearfix">
<div class="split btn-submit">
<input id="search" name="search" alt="Search" type="submit" value="Search" class="button button-primary" title="search"><span class="a_hide">ext search</span></input>
</div>
<noscript>
<input value="yes" name="javascriptoff" id="javascriptoff" type="hidden">
<div class="split btn-submit-nonjs">
<input name="nonjsSubmit" type="submit" alt="Search" value="Search" class="button button-primary" title="search"><span class="a_hide">ext search</span></input>
</div>
<div></div>
</noscript>
<div style="display:none" id="no-result">
<p>No Branch Found as per your search criteria</p>
</div>
<div id="branch-finder-results-container">
<hr>
</div>
</div>
</form>
到目前为止,这是我的代码:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
import scrapy
class HalifaxSpider(scrapy.Spider):
name = "halifax"
start_urls = [
"https://www.halifax.co.uk/branchfinder/search.asp"
]
def parse(self, response):
print(response.text)
yield FormRequest.from_response(response, formname='branch-finder-search-form', formdata={"postcode": "EH"}, callback=self.open_in_browser)
def open_in_browser(self, response):
print(response.text)
open_in_browser(response)
crawler = CrawlerProcess()
crawler.crawl(HalifaxSpider)
crawler.start()
我也试过传递参数 clickdata={"name": "search"},以防搜索按钮没有被点击,但我得到了相同的结果。
总的来说,我对网络抓取真的很陌生,所以我真的被困住了,不知道从这里去哪里。我看过一些文档,但是关于填写各种不同类型的表格的信息或教程并不多,所以我很茫然。
有人可以帮忙吗?
查看以下获取所需数据的方法。 Headers 在这里起着重要的作用,因此请务必包含它(我已经添加了)。
import scrapy
import urllib
from scrapy.crawler import CrawlerProcess
class HalifaxSpider(scrapy.Spider):
name = "halifax"
url = "https://www.halifax.co.uk/asp_includes/branch-finder/branch-finder.asp?"
def start_requests(self):
formdata = {
'street': '',
'town': '',
'postcode': 'EH',
'searchType': 'branch'
}
req_url = f'{self.url}{urllib.parse.urlencode(formdata)}'
yield scrapy.Request(req_url,callback=self.parse)
def parse(self,response):
print(response.css("h2#resultTitle1 ::text").getall())
if __name__ == "__main__":
crawler = CrawlerProcess({'USER_AGENT':'Mozilla/5.0'})
crawler.crawl(HalifaxSpider)
crawler.start()