使用 pandas 数据帧更新 Redshift 失败，字符串索引超出范围

Question

我正在尝试使用 psycopg2 和 psycopg2.extras 更新 Redshift table，但失败并出现以下错误。有人可以帮助解决这个错误吗？

{
  "errorMessage": "string index out of range",
  "errorType": "IndexError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 110, in lambda_handler\n    psycopg2.extras.execute_values (cursor, update_query, row, template=None, page_size=2000)\n",
    "  File \"/opt/python/lib/python3.8/site-packages/psycopg2/extras.py\", line 1289, in execute_values\n    parts.append(cur.mogrify(template, args))\n"
  ]
}

我有一个包含 23 列的数据框，我正尝试在 AWS Lambda 中按以下方式更新它。连接数据库成功但更新失败：

import psycopg2
import psycopg2.extras

df_pandas ## dataframe with 23 columns and 28 rows
connection = psycopg2.connect(host='casuc', dbname='skhcbiw',
                                             user='cksbci', password='****', port=0000)
                cursor = connection.cursor()
                #UPDATE: INSTEAD OF ITERTUPLES, I HAVE REPLACED IT with ITERROWS WHICH HAS GIVEN A DIFFERENT ERROR FROM THE ONE ABOVE. ERROR specified below the CODE
                for _, row in df_pandas.iterrows():
                    row = str(tuple(row)) #create a tuple that is a string
                    row = row[1:len(row)-1] #remove the beginning & ending ()
                    print(row)
                
                    update_query = """UPDATE table AS t 
                                      SET column1 = e.column1, column2 = e.column2, column3 = e.column3, 
                                          ......................................................
                                          ......................................................
                                          column22 = e.column22, column23 = e.column23
                                      FROM (VALUES %s) AS e('column1', 'column2', 'column3',
                                                             .......................................
                                                             .......................................
                                                             .......................................
                                                             .......................................
                                                             'column21', 'column22', 'column23') 
                                      WHERE e.column23 = t.column23;"""
                    psycopg2.extras.execute_values (cursor, update_query, row, template=None, page_size=2000)

新错误


{
  "errorMessage": "syntax error at or near \")\"\nLINE 9: ...','1','7','6','5','1','''',',',' ','0','.','0',',',' ','0'))\n                                                                      ^\n",
  "errorType": "SyntaxError",
  "stackTrace": [
    "  File \"/var/task/lambda_function.py\", line 118, in lambda_handler\n    psycopg2.extras.execute_values(cursor, update_query, (row, ), template=None, page_size=2000)\n",
    "  File \"/opt/python/lib/python3.8/site-packages/psycopg2/extras.py\", line 1292, in execute_values\n    cur.execute(b''.join(parts))\n"
  ]
}

我的输入行如下

'Zone 99', 'J005', 'Accepted', 'BIWUDBI', 'MNO101', '90.00H50 IUHIUH   YY 55RR', '878767', 0, 'Knoidci', 'A99', 0.0, 0, '2192238', '2020-12-31', 0.0, 0.0, 0.0, 0, 0, 0, '50017651', 0.0, 0

我看到数据“50017651”的每个值都作为“5”、“0”、“0”……传递。不知道是什么原因？

我已经从 Whosebug 引用了这 2 个 URL 来解决我的问题，但没有成功。

谢谢 Ganesh

Answer 1

上面的代码不起作用，因此我决定使用 link 中提到的方法：

方法是创建一个 TEMP TABLE 并将 dump/insert 数据放入此 table 然后更新 table 我想使用此 TEMP TABLE.提交连接后，TEMP TABLE 会在 AWS Redshift 中自动删除。我们不需要在 TEMP TABLE 代码中指定 'ON COMMIT DROP'。

import psycopg2

conn = psycopg2.connect("dbname='db' user='user' host='localhost' password='test'")
cur = conn.cursor()

rows = zip(df.id, df.z)
cur.execute("""CREATE TEMP TABLE codelist(id INTEGER, z INTEGER) ON COMMIT DROP""")
cur.executemany("""INSERT INTO codelist (id, z) VALUES(%s, %s)""", rows)

cur.execute("""
    UPDATE table_name
    SET z = codelist.z
    FROM codelist
    WHERE codelist.id = vehicle.id;
    """)

cur.rowcount
conn.commit()
cur.close()
conn.close()

问候象头神巴特

使用 pandas 数据帧更新 Redshift 失败，字符串索引超出范围

Update Redshift using pandas dataframe is failing with string index out of range

python

psycopg2

sql-update

pandas

amazon-redshift