使用 CSVFeedSpider 时出错
Getting Error while using CSVFeedSpider
我正在使用 CSVFeedSpider
抓取本地 csv 文件 (foods.csv)。
这里是:
calories name price
650 Belgian Waffles .95
900 Strawberry Belgian Waffles .95
900 Berry-Berry Belgian Waffles .95
600 French Toast .50
950 Homestyle Breakfast .95
这是我的 foods.py 文件的代码:
from scrapy.spiders import CSVFeedSpider
from foods_csv.items import FoodsCsvItem
class FoodsSpider(CSVFeedSpider):
name = 'foods'
start_urls = ['file:///users/Mina/Desktop/foods.csv']
delimiter = ';'
quotechar = "'"
headers = ['name', 'price', 'calories']
def parse_row(self, response, row):
self.logger.info('Hi, this is a row!: %r', row)
item = FoodsCsvItem()
item['name'] = row['name']
item['price'] = row['price']
item['calories'] = row['calories']
return item
items.py:
import scrapy
class FoodsCsvItem(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()
calories = scrapy.Field()
但它给了我这个错误:
2017-11-18 13:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET file:///users/Mina/Desktop/foods.csv> (referer: None)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 1 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 2 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 3 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 4 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 5 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 6 (length: 1, should be: 3)
一开始我只是抓取'name'和'price'但是它给了我同样的错误所以我尝试根据这个解决方案添加'calories' Scrapy: Scraping CSV File - not getting any output但是什么都没有改变!
我只需要抓取 'name' 和 'price' 我该怎么做?
试试这个
def parse_row(self, response, row):
self.logger.info('Hi, this is a row!: %r', row)
item = FoodsCsvItem()
item['name'] = row['name']
item['price'] = row['price']
item['calories'] = row['calories']
return item
您的 CSV 文件的确切格式似乎在发布时丢失了。如果格式与此处发布的完全一样,那么它实际上看起来像一个 TSV(制表符分隔值)文件,您可以尝试将 delimiter = ';'
更改为 delimiter = '\t'
。
但是,既然您指定了 '
作为引号字符,我认为这是正确的吗?我会尝试在 CSV 文件上 运行 a search/replace 并将 '
替换为 "
看看是否有帮助。在使用单引号之前我遇到了一些奇怪的问题。
我正在使用 CSVFeedSpider
抓取本地 csv 文件 (foods.csv)。
这里是:
calories name price
650 Belgian Waffles .95
900 Strawberry Belgian Waffles .95
900 Berry-Berry Belgian Waffles .95
600 French Toast .50
950 Homestyle Breakfast .95
这是我的 foods.py 文件的代码:
from scrapy.spiders import CSVFeedSpider
from foods_csv.items import FoodsCsvItem
class FoodsSpider(CSVFeedSpider):
name = 'foods'
start_urls = ['file:///users/Mina/Desktop/foods.csv']
delimiter = ';'
quotechar = "'"
headers = ['name', 'price', 'calories']
def parse_row(self, response, row):
self.logger.info('Hi, this is a row!: %r', row)
item = FoodsCsvItem()
item['name'] = row['name']
item['price'] = row['price']
item['calories'] = row['calories']
return item
items.py:
import scrapy
class FoodsCsvItem(scrapy.Item):
name = scrapy.Field()
price = scrapy.Field()
calories = scrapy.Field()
但它给了我这个错误:
2017-11-18 13:04:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET file:///users/Mina/Desktop/foods.csv> (referer: None)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 1 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 2 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 3 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 4 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 5 (length: 1, should be: 3)
2017-11-18 13:04:26 [scrapy.utils.iterators] WARNING: ignoring row 6 (length: 1, should be: 3)
一开始我只是抓取'name'和'price'但是它给了我同样的错误所以我尝试根据这个解决方案添加'calories' Scrapy: Scraping CSV File - not getting any output但是什么都没有改变!
我只需要抓取 'name' 和 'price' 我该怎么做?
试试这个
def parse_row(self, response, row):
self.logger.info('Hi, this is a row!: %r', row)
item = FoodsCsvItem()
item['name'] = row['name']
item['price'] = row['price']
item['calories'] = row['calories']
return item
您的 CSV 文件的确切格式似乎在发布时丢失了。如果格式与此处发布的完全一样,那么它实际上看起来像一个 TSV(制表符分隔值)文件,您可以尝试将 delimiter = ';'
更改为 delimiter = '\t'
。
但是,既然您指定了 '
作为引号字符,我认为这是正确的吗?我会尝试在 CSV 文件上 运行 a search/replace 并将 '
替换为 "
看看是否有帮助。在使用单引号之前我遇到了一些奇怪的问题。