尝试在 Psycopg2 不存在的地方插入时出现索引错误

Index error when trying to insert where not exists with Pyscopg2

我正在尝试使用 Psycopg2 将一些数据输入到 postgreSQL 数据库中。我用来加载数据库的函数如下:

def load_db():
    data = clean_data()

    conn = psycopg2.connect(database='database', user='user')
    cur = conn.cursor()

    for d in data:
        publisher_id = (d[5]['publisher_id'])
        publisher = (d[4]['publisher'])

        cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
           (publisher_id, publisher))

    conn.commit()
    cur.close()
    conn.close()

但我收到错误 IndexError: tuple index out of range 并且不确定我做错了什么。在我尝试输入的记录中,有很多重复的 publisher_idpublisher,因此 WHERE NOT EXISTS。我对通过 python 使用数据库还很陌生,所以我确信这很简单。提前致谢!

更新!

data的样本如下:

 [{'article_id': 7676933011},
  {'web_id': u'world/2015/jul/03/iranian-foreign-minister-raises-prospect-of-joint-action-against-islamic-state'},
  {'title': u'Iranian foreign minister raises prospect of joint action against Islamic State'},
  {'pub_date': u'2015-07-03T21:30:51Z'},
  {'publisher': 'The Guardian'},
  {'publisher_id': '1'},
  {'author': u'Julian Borger'},
  {'author_id': u'15924'},
  {'city_info': [{'city_name': u'Vienna',
                  'country_code': u'US',
                  'id': 4791160,
                  'lat': 38.90122,
                  'lon': -77.26526}]},
  {'country_info': [{'country_code': u'IR',
                     'country_name': u'Islamic Republic of Iran',
                     'lat': 32.0,
                     'lon': 53.0},
                    {'country_code': u'US',
                     'country_name': u'United States',
                     'lat': 39.76,
                     'lon': -98.5}]},
  {'org_info': [{'organization': u'Republican'},
                {'organization': u'US Congress'},
                {'organization': u'Palais Coburg Hotel'},
                {'organization': u'Islamic State'},
                {'organization': u'United'}]},
  {'people_info': [{'people': u'Mohammad Javad Zarif'},
                   {'people': u'John Kerry'}]}]

完整的追溯是:

Traceback (most recent call last):
  File "/Users/Desktop/process_text/LoadDB.py", line 69, in <module>
    load_db()
  File "/Users/Desktop/process_text/LoadDB.py", line 50, in load_db
    (publisher_id, publisher))
IndexError: tuple index out of range

问题出在您的 cur.execute() 行 -

cur.execute("INSERT INTO publisher (id, news_org) SELECT (%s,%s) WHERE NOT EXISTS (SELECT id FROM publisher WHERE id = %s);",
       (publisher_id, publisher))

正如您在上面看到的,您使用了三个 %s - ...SELECT (%s,%s)...WHERE id = %s); ,但您只提供了两个值(元组中的两个值)。

cur.execute在内部尝试查找第三个值时,会导致索引问题。

我不确定那里哪些值是正确的,但您需要将其更改为 2 %s ,或者在元组中提供第三个值 - (publisher_id, publisher) .