peewee.OperationalError: too many SQL variables on upsert of only 150 rows * 8 columns
peewee.OperationalError: too many SQL variables on upsert of only 150 rows * 8 columns
对于下面的例子,在我的机器上,设置 range(150)
会导致错误,而 range(100)
不会:
from peewee import *
database = SqliteDatabase(None)
class Base(Model):
class Meta:
database = database
colnames = ["A", "B", "C", "D", "E", "F", "G", "H"]
cols = {x: TextField() for x in colnames}
table = type('mytable', (Base,), cols)
database.init('test.db')
database.create_tables([table])
data = []
for x in range(150):
data.append({x: 1 for x in colnames})
with database.atomic() as txn:
table.insert_many(data).upsert().execute()
导致:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3213, in execute
cursor = self._execute()
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 2628, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3461, in execute_sql
self.commit()
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3285, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3454, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: too many SQL variables
这对我来说似乎很低。我正在尝试使用 peewee
来替换现有的基于 pandas
的 SQL 结构,因为 pandas
不支持主键。每个循环只能插入约 100 条记录是非常低的,如果某天列数增加,则很脆弱。
我怎样才能使这项工作更好?可能吗?
看这里,https://www.sqlite.org/limits.html#max_column好像限制应该是2000:
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper
bound on:
- ... snip ...
- The number of values in an INSERT statement
我猜你是不是遇到了极限?无论如何,只需分块输入或重新编译具有更高限制的 SQLite。
经过一些调查,问题似乎与 sql 查询可能具有的 the maximum number of parameters 有关:SQLITE_MAX_VARIABLE_NUMBER.
为了能够进行大批量插入,我首先估计 SQLITE_MAX_VARIABLE_NUMBER 然后用它在我要插入的字典列表中创建块。
为了估计我使用这个函数的价值inspired by this answer:
def max_sql_variables():
"""Get the maximum number of arguments allowed in a query by the current
sqlite3 implementation. Based on `this question
`_
Returns
-------
int
inferred SQLITE_MAX_VARIABLE_NUMBER
"""
import sqlite3
db = sqlite3.connect(':memory:')
cur = db.cursor()
cur.execute('CREATE TABLE t (test)')
low, high = 0, 100000
while (high - 1) > low:
guess = (high + low) // 2
query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in
range(guess)])
args = [str(i) for i in range(guess)]
try:
cur.execute(query, args)
except sqlite3.OperationalError as e:
if "too many SQL variables" in str(e):
high = guess
else:
raise
else:
low = guess
cur.close()
db.close()
return low
SQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()
然后我使用上面的变量来切片 data
with database.atomic() as txn:
size = (SQLITE_MAX_VARIABLE_NUMBER // len(data[0])) -1
# remove one to avoid issue if peewee adds some variable
for i in range(0, len(data), size):
table.insert_many(data[i:i+size]).upsert().execute()
有关 max_sql_variables
.
执行速度的更新
在一台 3 年前的 Intel 机器上,有 4 个内核和 4 Gb 内存,运行 OpenSUSE tumbleweed,SQLITE_MAX_VARIABLE_NUMBER 设置为 999,函数运行不到 100 毫秒。如果我设置 high = 1000000
,执行时间变为 300 毫秒的数量级。
在具有 8 个内核和 8Gb 内存的较新的 Intel 机器上,运行 Kubuntu,SQLITE_MAX_VARIABLE_NUMBER 设置为 250000,函数运行时间约为 2.6 秒,returns 99999。如果我设置 high = 1000000
,执行时间变为 4.5 秒左右。
对于下面的例子,在我的机器上,设置 range(150)
会导致错误,而 range(100)
不会:
from peewee import *
database = SqliteDatabase(None)
class Base(Model):
class Meta:
database = database
colnames = ["A", "B", "C", "D", "E", "F", "G", "H"]
cols = {x: TextField() for x in colnames}
table = type('mytable', (Base,), cols)
database.init('test.db')
database.create_tables([table])
data = []
for x in range(150):
data.append({x: 1 for x in colnames})
with database.atomic() as txn:
table.insert_many(data).upsert().execute()
导致:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3213, in execute
cursor = self._execute()
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 2628, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3461, in execute_sql
self.commit()
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3285, in __exit__
reraise(new_type, new_type(*exc_args), traceback)
File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3454, in execute_sql
cursor.execute(sql, params or ())
peewee.OperationalError: too many SQL variables
这对我来说似乎很低。我正在尝试使用 peewee
来替换现有的基于 pandas
的 SQL 结构,因为 pandas
不支持主键。每个循环只能插入约 100 条记录是非常低的,如果某天列数增加,则很脆弱。
我怎样才能使这项工作更好?可能吗?
看这里,https://www.sqlite.org/limits.html#max_column好像限制应该是2000:
The SQLITE_MAX_COLUMN compile-time parameter is used to set an upper bound on:
- ... snip ...
- The number of values in an INSERT statement
我猜你是不是遇到了极限?无论如何,只需分块输入或重新编译具有更高限制的 SQLite。
经过一些调查,问题似乎与 sql 查询可能具有的 the maximum number of parameters 有关:SQLITE_MAX_VARIABLE_NUMBER.
为了能够进行大批量插入,我首先估计 SQLITE_MAX_VARIABLE_NUMBER 然后用它在我要插入的字典列表中创建块。
为了估计我使用这个函数的价值inspired by this answer:
def max_sql_variables():
"""Get the maximum number of arguments allowed in a query by the current
sqlite3 implementation. Based on `this question
`_
Returns
-------
int
inferred SQLITE_MAX_VARIABLE_NUMBER
"""
import sqlite3
db = sqlite3.connect(':memory:')
cur = db.cursor()
cur.execute('CREATE TABLE t (test)')
low, high = 0, 100000
while (high - 1) > low:
guess = (high + low) // 2
query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in
range(guess)])
args = [str(i) for i in range(guess)]
try:
cur.execute(query, args)
except sqlite3.OperationalError as e:
if "too many SQL variables" in str(e):
high = guess
else:
raise
else:
low = guess
cur.close()
db.close()
return low
SQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()
然后我使用上面的变量来切片 data
with database.atomic() as txn:
size = (SQLITE_MAX_VARIABLE_NUMBER // len(data[0])) -1
# remove one to avoid issue if peewee adds some variable
for i in range(0, len(data), size):
table.insert_many(data[i:i+size]).upsert().execute()
有关 max_sql_variables
.
在一台 3 年前的 Intel 机器上,有 4 个内核和 4 Gb 内存,运行 OpenSUSE tumbleweed,SQLITE_MAX_VARIABLE_NUMBER 设置为 999,函数运行不到 100 毫秒。如果我设置 high = 1000000
,执行时间变为 300 毫秒的数量级。
在具有 8 个内核和 8Gb 内存的较新的 Intel 机器上,运行 Kubuntu,SQLITE_MAX_VARIABLE_NUMBER 设置为 250000,函数运行时间约为 2.6 秒,returns 99999。如果我设置 high = 1000000
,执行时间变为 4.5 秒左右。