Piplines.py 从 spider 导出数据到 postgresql 数据库

Piplines.py export data from spider to postgresql database

我为 DomRia 编写了一个爬虫,我想将所有数据保存在数据库中。 我正在使用 python 3.7 和 psycopg2。我的数据库在 docker-compose 容器中工作。我可以在 pgadmin 中看到有关数据库的所有信息。

我以为问题出在我的查询中,但似乎没问题。

import psycopg2



def conectToBase(user:str, password:str, host:str, 
                 port:str, database:str):

    try:
        connection = psycopg2.connect(user = user, password = password,
                                      host = host, port = port, database = database)

    except (Exception, psycopg2.Error) as error :
        print ("Error while connecting to PostgreSQL", error)

    return connection

class DomriaparserPipeline(object):
    def open_spider(self, spider):
        host = '172.18.0.2'
        user = 'postgres'
        password = 'changeme' 
        database = 'postgres'
        port = '5432'
        self.connection = conectToBase(user=user, password=password, host=host,
                                       port=port, database=database)
        self.cursor = self.connection.cursor()

    def close_spider(self, spider):
        self.cursor.close()
        self.connection.close()

    def process_item(self, item, spider):

        self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item["price"], item["url"]))

        self.connection.commit()

        return item

但是当我 运行 我的蜘蛛时我遇到了问题:

2019-08-13 16:12:24 [scrapy.core.scraper] ERROR: Error processing {'addres': ['Продаю 1к квартиру 21 кв. м, Предславинская улица 12   в районе '
            'Печерский в Киеве'],
 'data_of_pulication': ['10 авг'],
 'distance_center': ['до 5-ти минут'],
 'distance_market': ['до 5-ти минут'],
 'floor': [1.0],
 'kitchen_space': [4.5],
 'living_space': [13.0],
 'number_of_rooms': [1.0],
 'price': [35000],
 'storeys': [9.0],
 'total_space': [21.0],
 'type_center': ['пешком'],
 'type_heating': ['централизованное'],
 'type_market': ['пешком'],
 'uniqueID': [15727327],
 'url': ['https://dom.ria.com/ru/realty-perevireno-prodaja-kvartira-kiev-pecherskiy-predslavinskaya-ulitsa-15727327.html'],
 'who_saler': ['от посредника']}
Traceback (most recent call last):
  File "/home/maxim/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/maxim/PROJECTS/DomRiaParser/DomRiaParser/pipelines.py", line 52, in process_item
    self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item["price"],item["url"]))
psycopg2.InternalError: current transaction is aborted, commands ignored until end of transaction block

如何解决?

你能检查列 domria.pricedomria.url 的类型吗?您可能会注意到您正在尝试向其中插入一个 Python 列表。

你也可以尝试使用这个:

def process_item(self, item, spider):

    self.cursor.execute("INSERT INTO domria (price,url) VALUES (%s,%s)", (item.get("price", [0])[0], item.get("url", ["not available"])[0] ))

    self.connection.commit()

    return item