Scrapy 已完成 运行 结果在控制台中,但 CSV 输出仍为空白
Scrapy is done running with results in the console but CSV output remains blank
我是 scrapy 的新手,所以如果在 csv 文件中没有结果,我很难找出我做错了什么。我可以在控制台中看到结果。这是我尝试过的:
主文件夹名为 "realyp"。
Spider 文件名为 "yp.py",代码为:
from scrapy.selector import Selector
from scrapy.spider import BaseSpider
from realyp.items import RealypItem
class MySpider(BaseSpider):
name="YellowPage"
allowed_domains=["yellowpages.com"]
start_urls=["https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=Los%20Angeles%2C%20CA&page=2"]
def parse(self, response):
title = Selector(response)
page=title.xpath('//div[@class="info"]')
items = []
for titles in page:
item = RealypItem()
item["name"] = titles.xpath('.//span[@itemprop="name"]/text()').extract()
item["address"] = titles.xpath('.//span[@itemprop="streetAddress" and @class="street-address"]/text()').extract()
item["phone"] = titles.xpath('.//div[@itemprop="telephone" and @class="phones phone primary"]/text()').extract()
items.append(item)
return items
"items.py" 文件包括:
from scrapy.item import Item, Field
class RealypItem(Item):
name= Field()
address = Field()
phone= Field()
要获取 csv 输出,我的命令行是:
cd desktop
cd realyp
scrapy crawl YellowPage -o items.csv -t csv
任何帮助将不胜感激。
如@Granitosauros 所述,您应该使用 yield
而不是 return
。 yield 应该在 for 循环内。
在 for 循环中,如果路径以 // 开头,则选择文档中满足以下条件的所有元素(参见 here)。
这是适合我的(粗略)代码:
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
from scrapy.spider import BaseSpider
from realyp.items import RealypItem
class MySpider(BaseSpider):
name="YellowPage"
allowed_domains=["yellowpages.com"]
start_urls=["https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=Los%20Angeles%2C%20CA&page=2"]
def parse(self, response):
for titles in response.xpath('//div[@class = "result"]/div'):
item = RealypItem()
item["name"] = titles.xpath('div[2]/div[2]/h2 /a/span[@itemprop="name"]/text()').extract()
item["address"] = titles.xpath('string(div[2]/div[2]/div/p[@itemprop="address"])').extract()
item["phone"] = titles.xpath('div[2]/div[2]/div/div[@itemprop="telephone" and @class="phones phone primary"]/text()').extract()
yield item
我是 scrapy 的新手,所以如果在 csv 文件中没有结果,我很难找出我做错了什么。我可以在控制台中看到结果。这是我尝试过的:
主文件夹名为 "realyp"。 Spider 文件名为 "yp.py",代码为:
from scrapy.selector import Selector
from scrapy.spider import BaseSpider
from realyp.items import RealypItem
class MySpider(BaseSpider):
name="YellowPage"
allowed_domains=["yellowpages.com"]
start_urls=["https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=Los%20Angeles%2C%20CA&page=2"]
def parse(self, response):
title = Selector(response)
page=title.xpath('//div[@class="info"]')
items = []
for titles in page:
item = RealypItem()
item["name"] = titles.xpath('.//span[@itemprop="name"]/text()').extract()
item["address"] = titles.xpath('.//span[@itemprop="streetAddress" and @class="street-address"]/text()').extract()
item["phone"] = titles.xpath('.//div[@itemprop="telephone" and @class="phones phone primary"]/text()').extract()
items.append(item)
return items
"items.py" 文件包括:
from scrapy.item import Item, Field
class RealypItem(Item):
name= Field()
address = Field()
phone= Field()
要获取 csv 输出,我的命令行是:
cd desktop
cd realyp
scrapy crawl YellowPage -o items.csv -t csv
任何帮助将不胜感激。
如@Granitosauros 所述,您应该使用 yield
而不是 return
。 yield 应该在 for 循环内。
在 for 循环中,如果路径以 // 开头,则选择文档中满足以下条件的所有元素(参见 here)。
这是适合我的(粗略)代码:
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
from scrapy.spider import BaseSpider
from realyp.items import RealypItem
class MySpider(BaseSpider):
name="YellowPage"
allowed_domains=["yellowpages.com"]
start_urls=["https://www.yellowpages.com/search?search_terms=Coffee%20Shops&geo_location_terms=Los%20Angeles%2C%20CA&page=2"]
def parse(self, response):
for titles in response.xpath('//div[@class = "result"]/div'):
item = RealypItem()
item["name"] = titles.xpath('div[2]/div[2]/h2 /a/span[@itemprop="name"]/text()').extract()
item["address"] = titles.xpath('string(div[2]/div[2]/div/p[@itemprop="address"])').extract()
item["phone"] = titles.xpath('div[2]/div[2]/div/div[@itemprop="telephone" and @class="phones phone primary"]/text()').extract()
yield item