Scrapy 将 csv 文件中的额外数据传递给解析
Scrapy pass extra data from csv file into parse
我的 scrapy 蜘蛛查看 csv 文件并使用 csv 文件中的地址运行 start_urls,如下所示:
from csv import DictReader
with open('addresses.csv') as rows:
start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]
但 .csv 文件还包含电子邮件和其他信息。如何将这些额外信息传递到解析中以将其添加到新文件中?
import scrapy
from csv import DictReader
with open('addresses.csv') as rows:
names=[row["Name"].replace(',','') for row in DictReader(rows)]
emails=[row["Email"].replace(',','') for row in DictReader(rows)]
start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]
def parse(self,response):
yield{
'name': FROM CSV,
'email': FROM CSV,
'address' FROM SCRAPING:
'city' FROM SCRAPING:
}
import scrapy
from csv import DictReader
class MySpider(scrapy.Spider):
def start_requests(self):
with open('addresses.csv') as rows:
for row in DictReader(rows):
name=row["Name"].replace(',','')
email=row["Email"].replace(',','')
link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+')
yield Request(url = link,
callback = self.parse,
method = "GET",
meta={'name':name, 'email':email}
)
def parse(self,response):
yield{
'name': resposne.meta['name'],
'email': respose.meta['email'],
'address' FROM SCRAPING:
'city' FROM SCRAPING:
}
- 打开您的 CSV 文件。
- 在
start_requests
方法中对其进行迭代。
- 给回调函数传递参数,使用
meta
变量,可以在meta
. 中传递一个Python字典
注:
请记住 start_requests
不是我自定义的方法,它是 Python Scrapy 的方法。参见 https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests
我的 scrapy 蜘蛛查看 csv 文件并使用 csv 文件中的地址运行 start_urls,如下所示:
from csv import DictReader
with open('addresses.csv') as rows:
start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]
但 .csv 文件还包含电子邮件和其他信息。如何将这些额外信息传递到解析中以将其添加到新文件中?
import scrapy
from csv import DictReader
with open('addresses.csv') as rows:
names=[row["Name"].replace(',','') for row in DictReader(rows)]
emails=[row["Email"].replace(',','') for row in DictReader(rows)]
start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]
def parse(self,response):
yield{
'name': FROM CSV,
'email': FROM CSV,
'address' FROM SCRAPING:
'city' FROM SCRAPING:
}
import scrapy
from csv import DictReader
class MySpider(scrapy.Spider):
def start_requests(self):
with open('addresses.csv') as rows:
for row in DictReader(rows):
name=row["Name"].replace(',','')
email=row["Email"].replace(',','')
link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+')
yield Request(url = link,
callback = self.parse,
method = "GET",
meta={'name':name, 'email':email}
)
def parse(self,response):
yield{
'name': resposne.meta['name'],
'email': respose.meta['email'],
'address' FROM SCRAPING:
'city' FROM SCRAPING:
}
- 打开您的 CSV 文件。
- 在
start_requests
方法中对其进行迭代。 - 给回调函数传递参数,使用
meta
变量,可以在meta
. 中传递一个Python字典
注:
请记住 start_requests
不是我自定义的方法,它是 Python Scrapy 的方法。参见 https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests