我怎样才能得到正确的选择器（CSS/XPath）与 Scrapy 一起使用？

Question

我想从这个站点抓取信息：https://www.atl.no/finn-trafikkskole?limit=0&limitstart=0 (a national list of driving schools) to map zip codes and company names on a map (I've already got the mapping from zip codes to coordinates) to find areas with a significant concentration of schools. An optimal result would be a selector that extracts all the relevant information of each of the 710 companies (all relevant information of each company)

Highlighted zip code of the first driving school

我已经尝试复制 CSS "selector" 和所需的 table (table as in Chrome DevTools) 的 XPath 但是当运行 CSS selector/XPath 在 Scrapy 中它 returns 什么都没有。

在 Scrapy shell 中运行时复制的 CSS 选择器的示例 shell:

In(1): response.css("#adminForm > table > tbody").extract()

输出(1): []

我做错了什么，我应该如何继续获得想要的结果？

Answer 1

根据页面结构，我将解析工作拆分如下：

    def extract_text(self, item):
        text = item.get()
        text = re.sub(r'<.*?>', '', text)
        return text

    def parse(self, response):
        for school in response.css('.uk-table tr'):

            yield {
                'address': self.extract_text(school.css('.school-address')),
                'school': school.css('tr > td > a::text').get(),
            }

Answer 2

#adminForm > table > tbody returns 为空结果，因为 tbody 是由 Firefox 和 Chrome.

等浏览器自动添加的标签

但是当用 Scrapy 抓取时，tbody 没有出现在响应中 HTML。

查看页面源代码：查看源代码：https://www.atl.no/finn-trafikkskole?limit=0&limitstart=0

参见 Scrapy talks about tbody 标签 https://docs.scrapy.org/en/latest/topics/developer-tools.html#caveats-with-inspecting-the-live-browser-dom

我怎样才能得到正确的选择器（CSS/XPath）与 Scrapy 一起使用？

How can I get the right selector (CSS/XPath) to use with Scrapy?

css

xpath

screen-scraping

selector

scrapy