peewee vs sqlalchemy 性能
peewee vs sqlalchemy performance
我有 2 个简单的脚本:
from sqlalchemy import create_engine, ForeignKey, Table
from sqlalchemy import Column, Date, Integer, String, DateTime, BigInteger, event
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import Engine
from sqlalchemy.orm import relationship, backref, sessionmaker, scoped_session, Session
class Test(declarative_base()):
__tablename__ = "Test"
def __init__(self, *args, **kwargs):
args = args[0]
for key in args:
setattr(self, key, args[key] )
key = Column(String, primary_key=True)
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a})
engine = create_engine("sqlite:///testn", echo=False)
with engine.connect() as connection:
Test.metadata.create_all(engine)
session = Session(engine)
list(map(lambda x: session.merge(Test(x)), data))
session.commit()
结果:
real 0m15.300s
user 0m14.920s
sys 0m0.351s
第二个脚本:
from peewee import *
class Test(Model):
key = TextField(primary_key=True,null=False)
dbname = "test"
db = SqliteDatabase(dbname)
Test._meta.database = db
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a })
if not Test.table_exists():
db.create_tables([Test])
with db.atomic() as tr:
Test.insert_many(data).upsert().execute()
结果:
real 0m3.253s
user 0m2.620s
sys 0m0.571s
为什么?
这种比较并不完全有效,因为发出更新插入式查询与 SQLAlchemy Session.merge
所做的非常不同:
Session.merge()
examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on primary key, and if none can be located, creates a new instance.
在此测试用例中,这将导致对数据库进行 10,000 次加载尝试,这是昂贵的。
另一方面,当将 peewee 与 sqlite 结合使用时,insert_many(data)
and upsert()
的组合可以产生单个查询:
INSERT OR REPLACE INTO Test (key) VALUES ('key0'), ('key1'), ...
没有要协调的会话状态,因为 peewee 是一种与 SQLAlchemy 截然不同的 ORM,快速浏览看起来更接近 Core and Table
s
在 SQLAlchemy 中而不是 list(map(lambda x: session.merge(Test(x)), data))
你可以恢复使用 Core:
session.execute(Test.__table__.insert(prefixes=['OR REPLACE']).values(data))
关于此的一个主要缺点是您必须手动将数据库供应商特定的前缀写入 INSERT
。这也将破坏会话,因为它将没有关于新添加的行的信息或知识。
使用模型对象的批量插入是 little more involved with SQLAlchemy。很简单地说,使用 ORM 是易用性和速度之间的权衡:
ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.
我有 2 个简单的脚本:
from sqlalchemy import create_engine, ForeignKey, Table
from sqlalchemy import Column, Date, Integer, String, DateTime, BigInteger, event
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.engine import Engine
from sqlalchemy.orm import relationship, backref, sessionmaker, scoped_session, Session
class Test(declarative_base()):
__tablename__ = "Test"
def __init__(self, *args, **kwargs):
args = args[0]
for key in args:
setattr(self, key, args[key] )
key = Column(String, primary_key=True)
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a})
engine = create_engine("sqlite:///testn", echo=False)
with engine.connect() as connection:
Test.metadata.create_all(engine)
session = Session(engine)
list(map(lambda x: session.merge(Test(x)), data))
session.commit()
结果:
real 0m15.300s
user 0m14.920s
sys 0m0.351s
第二个脚本:
from peewee import *
class Test(Model):
key = TextField(primary_key=True,null=False)
dbname = "test"
db = SqliteDatabase(dbname)
Test._meta.database = db
data = []
for a in range(0,10000):
data.append({ "key" : "key%s" % a })
if not Test.table_exists():
db.create_tables([Test])
with db.atomic() as tr:
Test.insert_many(data).upsert().execute()
结果:
real 0m3.253s
user 0m2.620s
sys 0m0.571s
为什么?
这种比较并不完全有效,因为发出更新插入式查询与 SQLAlchemy Session.merge
所做的非常不同:
Session.merge()
examines the primary key attributes of the source instance, and attempts to reconcile it with an instance of the same primary key in the session. If not found locally, it attempts to load the object from the database based on primary key, and if none can be located, creates a new instance.
在此测试用例中,这将导致对数据库进行 10,000 次加载尝试,这是昂贵的。
另一方面,当将 peewee 与 sqlite 结合使用时,insert_many(data)
and upsert()
的组合可以产生单个查询:
INSERT OR REPLACE INTO Test (key) VALUES ('key0'), ('key1'), ...
没有要协调的会话状态,因为 peewee 是一种与 SQLAlchemy 截然不同的 ORM,快速浏览看起来更接近 Core and Table
s
在 SQLAlchemy 中而不是 list(map(lambda x: session.merge(Test(x)), data))
你可以恢复使用 Core:
session.execute(Test.__table__.insert(prefixes=['OR REPLACE']).values(data))
关于此的一个主要缺点是您必须手动将数据库供应商特定的前缀写入 INSERT
。这也将破坏会话,因为它将没有关于新添加的行的信息或知识。
使用模型对象的批量插入是 little more involved with SQLAlchemy。很简单地说,使用 ORM 是易用性和速度之间的权衡:
ORMs are basically not intended for high-performance bulk inserts - this is the whole reason SQLAlchemy offers the Core in addition to the ORM as a first-class component.