使用 scrapy 将数据存储在 csv 中
Issue storing data in csv using scrapy
下面是我的scrapy spider的解析方法。我在 csv 中的预期输出是具有相应值的三列。虽然在终端输出中我得到了所有三列(甚至它显示了 84 个项目存储在 output.csv 中,这是正确的)。但在实际输出文件中,我只有第一列“标题。帮助赞赏
编辑:在JSON中所有数据都在那里
def parse(self, response):
for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
title = titl.strip()
yield {"Title":title}
for dep in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
department = dep.strip()
yield{"Department":department}
for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
country = countr.strip()
yield{"Country":country}
scrapy crawl task -o output.csv
完整代码:
class TaskUs(scrapy.Spider):
name = 'task'
start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]
# def start_requests(self):
# for URL in self.start_urls:
# yield scrapy.Request(url=URL, meta={'proxy': 'http://103.241.227.108:6666'}, callback=self.parse)
def parse(self, response):
# for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
# title = titl.strip()
# yield {"Title":title}
# for dep in response.xpath('//span[@class="jv-job-list-category"]/text()').extract():
# department = dep.strip()
# yield{"Department":department}
# for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
# country = countr.strip()
# yield{"Country":country}
ti = response.xpath('//span[@class="jv-job-list-title"]/text()').extract()
de = response.xpath('//span[@class="jv-job-list-category"]/text()').extract()
co = response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract()
yield{'titl':ti, 'Depa': de, "Cou": co}
解决方法如下:
代码:
import scrapy
class TaskUs(scrapy.Spider):
name = 'task'
start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]
def parse(self, response):
tables = response.xpath('//*[@class="jv-job-list jv-search-list"]/tbody/tr')
for table in tables:
yield {
'Title':table.xpath('.//*[@class="jv-job-list-name"]/span[1]/text()').get()
}
输出:
{'Title': 'Real Time Analyst'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'VP of Workforce Analytics'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Management Positions'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-15 21:51:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 1,
'downloader/request_bytes': 686,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 15293,
'downloader/response_count': 1,
'downloader/response_status_count/200'
下面是我的scrapy spider的解析方法。我在 csv 中的预期输出是具有相应值的三列。虽然在终端输出中我得到了所有三列(甚至它显示了 84 个项目存储在 output.csv 中,这是正确的)。但在实际输出文件中,我只有第一列“标题。帮助赞赏
编辑:在JSON中所有数据都在那里
def parse(self, response):
for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
title = titl.strip()
yield {"Title":title}
for dep in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
department = dep.strip()
yield{"Department":department}
for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
country = countr.strip()
yield{"Country":country}
scrapy crawl task -o output.csv
完整代码:
class TaskUs(scrapy.Spider):
name = 'task'
start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]
# def start_requests(self):
# for URL in self.start_urls:
# yield scrapy.Request(url=URL, meta={'proxy': 'http://103.241.227.108:6666'}, callback=self.parse)
def parse(self, response):
# for titl in response.xpath('//span[@class="jv-job-list-title"]/text()').extract():
# title = titl.strip()
# yield {"Title":title}
# for dep in response.xpath('//span[@class="jv-job-list-category"]/text()').extract():
# department = dep.strip()
# yield{"Department":department}
# for countr in response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract():
# country = countr.strip()
# yield{"Country":country}
ti = response.xpath('//span[@class="jv-job-list-title"]/text()').extract()
de = response.xpath('//span[@class="jv-job-list-category"]/text()').extract()
co = response.xpath('//td[@class="jv-job-list-name"]/span[2]/text()').extract()
yield{'titl':ti, 'Depa': de, "Cou": co}
解决方法如下:
代码:
import scrapy
class TaskUs(scrapy.Spider):
name = 'task'
start_urls = ["https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0"]
def parse(self, response):
tables = response.xpath('//*[@class="jv-job-list jv-search-list"]/tbody/tr')
for table in tables:
yield {
'Title':table.xpath('.//*[@class="jv-job-list-name"]/span[1]/text()').get()
}
输出:
{'Title': 'Real Time Analyst'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Senior Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'VP of Workforce Analytics'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Management Positions'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Manager'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Planner'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.scraper] DEBUG: Scraped from <200 https://jobs.jobvite.com/taskus-inc/search?c=Workforce%20Management&p=0>
{'Title': 'Workforce Supervisor'}
2021-08-15 21:51:14 [scrapy.core.engine] INFO: Closing spider (finished)
2021-08-15 21:51:14 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 1,
'downloader/request_bytes': 686,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 15293,
'downloader/response_count': 1,
'downloader/response_status_count/200'