使用 psycopg2 将多行插入 postgreSQL 时出错

Question

我有许多 XML 文件需要打开，然后进行处理以生成大量行，然后将这些行插入到远程 postgress 数据库中的多个 table 中。

为了提取 XML 数据，我使用 xml.etree.ElementTree 解析 XML 树并根据需要提取元素。虽然我正在做很多事情，但基本操作是获取一个特定元素，无论是 String 还是 Integer，并将其放入多个字典中的一个。

经过更多处理后，我有许多词典需要插入到我的数据库中。对于任何单个 xml 文件，我可能会在 3 table 秒内生成多达 8-10,000 行（或查询）。

测试时，我输出到 sql 文件，然后手动运行查询。如果我有很多 xml 文件，那显然是行不通的..

因此，我尝试使用 psycopg2 来自动执行此过程。据我所知，从堆栈溢出和其他地方运行个人 execute 函数非常慢。基于 This Whosebug question 我尝试编写如下代码：

QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row) for row in myData)
cur.execute('INSERT INTO DBTABLE' + QueryData)
cur.commit()

其中 myData 是一个元组列表 [(a,b,c),(a,b,c),(a,b,c)...]，其内容是 xml.etree.ElementTree 提取的数据和我自己计算的值的组合。

当我尝试实际执行上述代码时，却出现以下错误：

TypeError: sequence item 0: expected str instance, bytes found

好的...如果我然后尝试将我的数据（每个元组元素）转换为 str() 但是我得到：

TypeError: encoding without a string argument

我是不是完全错了？我怎样才能做我需要的？我正在使用 Python3.

额外的

我被要求展示一个数据示例。

这里是最简单的，就是3个整数值放到一个table中。它的形式是：(document_id,item_index,item_code)

一个典型的例子是：(937, 138, 681)

我的一般转换尝试是尝试：

(str(document_id),str(item_index),str(item_code))

我也试过走另一条路：

(bytes(document_id,'utf-8'),bytes(item_index,'utf-8'),bytes(item_code,'utf-8'))

后者也会报错：TypeError: encoding without a string argument

Answer 1

您在 table 名称后缺少 VALUES，其他一切似乎都是正确的：

cursorPG.execute("INSERT INTO test VALUES "+','.join(cursorPG.mogrify('(%s,%s)',x) for x in mydata))

Answer 2

好的，我让它工作了...但是我对我的解决方案为何有效感到困惑。我将其作为答案发布，但如果有人可以向我解释发生了什么，那就太好了：

基本上是这样的：

QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row) for row in myData)
cur.execute('INSERT INTO DBTABLE' + QueryData)

必须更改为：

QueryData = b','.join(cur.mogrify(b'(%s,%s,%s)', row) for row in myData)
cur.execute(b'INSERT INTO DBTABLE' + QueryData)

这让我觉得很不优雅。

Answer 3

psycopg documentation 声明对于 cur.mogrify:

The returned string is always a bytes string.

所以要使用这个 hack，你只需要将 mogrify 的结果解码回一个字符串，例如：

QueryData = ','.join(cur.mogrify('(%s,%s,%s)', row).decode('utf-8') for row in myData)
cur.execute('INSERT INTO DBTABLE' + QueryData)

不过如this Whosebug question, the most efficient way to copy large amounts of data is to use COPY所述。您可以使用任何 "python file-like object" 来执行此操作。这是来自 psycopg 文档的示例：

>>> f = StringIO("42\tfoo\n74\tbar\n")
>>> cur.copy_from(f, 'test', columns=('num', 'data'))
>>> cur.execute("select * from test where id > 5;")
>>> cur.fetchall()
[(6, 42, 'foo'), (7, 74, 'bar')]

使用 psycopg2 将多行插入 postgreSQL 时出错

Errors inserting many rows into postgreSQL with psycopg2

python

postgresql

psycopg2

xml-parsing

python-3.x