python - scrapy,一张一张打印

python - scrapy, one by one print

我正在研究google 搜索抓取

这是我的代码

 def parse(self, response):
    all_page = response.xpath('//*[@id="main"]')

    for page in all_page:
        title = page.xpath('//*[@id="main"]/div/div/div/a/h3/div/text()').extract()
        link = page.xpath('//*[@id="main"]/div/div/div/a/@href').extract()
        print('title', title)
        print('link', link)

输出是

标题['iPhone - Compare Models - Apple','iPhone - Compare Models - Apple (MY)',......]

link[https://www.apple.com/iphone/compare/&sa=U&ved=2ahUKEwiKvsnnmLDxAhWZIDQIHXEdA60QFjAGegQIBRAB&usg=AOvVaw1FCyWoMh1LcbM65W6l8ypN', '/url?q=https:// www.apple.com/my/iphone/compare/&sa=U&ved=2ahUKEwiKvsnnmLDxAhWZIDQIHXEdA60QFjAHegQICBAB&usg=AOvVaw3i33ED_sBrbAuNLAJsOlxe',....]

我要这样

标题:'iPhone - Compare Models - Apple'

Link : https://www.apple.com/iphone/compare/&sa=U&ved=2ahUKEwiKvsnnmLDxAhWZIDQIHXEdA60QFjAGegQIBRAB&usg=AOvVaw1FCyWoMh1LcbM65W6l8ypN'

标题:''iPhone - Compare Models - Apple (MY)'

Link :https://www.apple.com/my/iphone/compare/&sa=U&ved=2ahUKEwiKvsnnmLDxAhWZIDQIHXEdA60QFjAHegQICBAB&usg=AOvVaw3i33ED_sBrbAuNLAJsOlxe'

怎么做?

谢谢

你的 titlelink 结果是 list 有几个项目,所以你需要遍历它们,与 zip 并行假设你得到一个 link 每个标题。 从您的示例中看起来是这样,但请确保确实如此。

def parse(self, response):
    all_page = response.xpath('//*[@id="main"]')

    for page in all_page:
        titles = page.xpath('//*[@id="main"]/div/div/div/a/h3/div/text()').extract()
        links = page.xpath('//*[@id="main"]/div/div/div/a/@href').extract()
        for title, link in zip(titles, links):
            print (f"title: '{title}'\n\n"
                   f"Link: {link}")

extract() 方法 xpath 提取搜索表达式找到的所有项目,因此使用 page.xpath('//*[@id="main"]/div/div/div/a/h3/div/text()').extract() 您可以获得该元素的所有结果。 您可以 zip 将它们放在一起,如另一个答案中所述,但更准确地说,您需要从父对象开始。

for page in all_page:
    for element in page.xpath('//*[@id="main"]/div/div/div'):
        title = element.xpath('a/h3/div/text()'). extract_first()
        link = element.xpath('a/@href').extract_first()
        print('title', title)
        print('link', link)