Piplines.py 从 spider 导出数据到 postgresql 数据库
Piplines.py export data from spider to postgresql database
我为 DomRia 编写了一个爬虫,我想将所有数据保存在数据库中。
我正在使用 python 3.7 和 psycopg2。我的数据库在 docker-compose 容器中工作。我可以在 pgadmin 中看到有关数据库的所有信息。
我以为问题出在我的查询中,但似乎没问题。
import psycopg2
def conectToBase(user:str, password:str, host:str,
port:str, database:str):
try:
connection = psycopg2.connect(user = user, password = password,
host = host, port = port, database = database)
except (Exception, psycopg2.Error) as error :
print ("Error while connecting to PostgreSQL", error)
return connection
class DomriaparserPipeline(object):
def open_spider(self, spider):
host = '172.18.0.2'
user = 'postgres'
password = 'changeme'
database = 'postgres'
port = '5432'
self.connection = conectToBase(user=user, password=password, host=host,
port=port, database=database)
self.cursor = self.connection.cursor()
def close_spider(self, spider):
self.cursor.close()
self.connection.close()
def process_item(self, item, spider):
self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item["price"], item["url"]))
self.connection.commit()
return item
但是当我 运行 我的蜘蛛时我遇到了问题:
2019-08-13 16:12:24 [scrapy.core.scraper] ERROR: Error processing {'addres': ['Продаю 1к квартиру 21 кв. м, Предславинская улица 12 в районе '
'Печерский в Киеве'],
'data_of_pulication': ['10 авг'],
'distance_center': ['до 5-ти минут'],
'distance_market': ['до 5-ти минут'],
'floor': [1.0],
'kitchen_space': [4.5],
'living_space': [13.0],
'number_of_rooms': [1.0],
'price': [35000],
'storeys': [9.0],
'total_space': [21.0],
'type_center': ['пешком'],
'type_heating': ['централизованное'],
'type_market': ['пешком'],
'uniqueID': [15727327],
'url': ['https://dom.ria.com/ru/realty-perevireno-prodaja-kvartira-kiev-pecherskiy-predslavinskaya-ulitsa-15727327.html'],
'who_saler': ['от посредника']}
Traceback (most recent call last):
File "/home/maxim/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/maxim/PROJECTS/DomRiaParser/DomRiaParser/pipelines.py", line 52, in process_item
self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item["price"],item["url"]))
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block
如何解决?
你能检查列 domria.price
和 domria.url
的类型吗?您可能会注意到您正在尝试向其中插入一个 Python 列表。
你也可以尝试使用这个:
def process_item(self, item, spider):
self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item.get("price", [0])[0], item.get("url", ["not available"])[0] ))
self.connection.commit()
return item
我为 DomRia 编写了一个爬虫,我想将所有数据保存在数据库中。 我正在使用 python 3.7 和 psycopg2。我的数据库在 docker-compose 容器中工作。我可以在 pgadmin 中看到有关数据库的所有信息。
我以为问题出在我的查询中,但似乎没问题。
import psycopg2
def conectToBase(user:str, password:str, host:str,
port:str, database:str):
try:
connection = psycopg2.connect(user = user, password = password,
host = host, port = port, database = database)
except (Exception, psycopg2.Error) as error :
print ("Error while connecting to PostgreSQL", error)
return connection
class DomriaparserPipeline(object):
def open_spider(self, spider):
host = '172.18.0.2'
user = 'postgres'
password = 'changeme'
database = 'postgres'
port = '5432'
self.connection = conectToBase(user=user, password=password, host=host,
port=port, database=database)
self.cursor = self.connection.cursor()
def close_spider(self, spider):
self.cursor.close()
self.connection.close()
def process_item(self, item, spider):
self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item["price"], item["url"]))
self.connection.commit()
return item
但是当我 运行 我的蜘蛛时我遇到了问题:
2019-08-13 16:12:24 [scrapy.core.scraper] ERROR: Error processing {'addres': ['Продаю 1к квартиру 21 кв. м, Предславинская улица 12 в районе '
'Печерский в Киеве'],
'data_of_pulication': ['10 авг'],
'distance_center': ['до 5-ти минут'],
'distance_market': ['до 5-ти минут'],
'floor': [1.0],
'kitchen_space': [4.5],
'living_space': [13.0],
'number_of_rooms': [1.0],
'price': [35000],
'storeys': [9.0],
'total_space': [21.0],
'type_center': ['пешком'],
'type_heating': ['централизованное'],
'type_market': ['пешком'],
'uniqueID': [15727327],
'url': ['https://dom.ria.com/ru/realty-perevireno-prodaja-kvartira-kiev-pecherskiy-predslavinskaya-ulitsa-15727327.html'],
'who_saler': ['от посредника']}
Traceback (most recent call last):
File "/home/maxim/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/maxim/PROJECTS/DomRiaParser/DomRiaParser/pipelines.py", line 52, in process_item
self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item["price"],item["url"]))
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block
如何解决?
你能检查列 domria.price
和 domria.url
的类型吗?您可能会注意到您正在尝试向其中插入一个 Python 列表。
你也可以尝试使用这个:
def process_item(self, item, spider):
self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item.get("price", [0])[0], item.get("url", ["not available"])[0] ))
self.connection.commit()
return item