列表理解可以帮助迭代 sqlalchemy 查询 returns 吗？

Question

这是一个令人难以置信的慢循环（~1.5[it/s] 使用 tqdm 对其进行测量）

对于上下文，对象指的是本地的 flask-SQLAlchemy 管理的 postgres 数据库的模型。即：网络传输速度不是速度慢的原因。

for author in tqdm(authors):
    new_score = 0
    for book in author.maintitles:
        new_score = new_score + book.score
        author.score = new_score

进一步澄清：有约 50 万本书和约 5 万作者。每本书可以由多个作者撰写。

我不会返回列表，但我确信这可以改进 - 列表理解真的可以改进它吗？

类似...

[[(new_score = new_score + book.score,
            author.score = new_score) for book in author.maintitles] for author in tqdm(authors)]

Answer 1

没有，don't use a list comprehension for side effects. Even if you were going to use the list, comprehensions are only slightly faster than for-loops anyway。

但是，您可以使用类似的生成器表达式改进代码。

第 1 步：在末尾而不是每个循环分配给 author.score，并使用扩充分配。

for author in tqdm(authors):
    new_score = 0
    for book in author.maintitles:
        new_score += book.score
    author.score = new_score

第2步：现在很明显new_score是一个简单的求和，所以用sum代替。

for author in tqdm(authors):
    author.score = sum(book.score for book in author.maintitles)

旁注：您也可以使用列表推导来编写此代码，但这会使它构建列表然后对其求和，而生成器表达式更有效，因为它会边进行求和。

sum([book.score for book in author.maintitles])

Answer 2

由于提供的重构仅证明列表理解不是解决方案 - 我已经发现了问题的根本原因，所以我添加以下内容作为答案。

上面的代码片段是 returned query 对 list 的操作的一部分 - 如前所述，遍历去重的 authors 列表（约 50K 作者）在最终操作中是 1.5 it/s:

的 15 小时过程

    # Make the popular books query
    popular_books = \
        db.session.query(Book).filter(Book.score > 0).all()
    
    # Make a list of all authors for each book returned in the query
    authors = []
    for book in popular_books:
        authors = authors + book.mainauthors
    
    # Remove duplicates using set()
    authors = list(set(authors))
    

    for author in tqdm(authors):
        author.score = sum(book.score for book in author.maintitles)
    db.session.commit()

只需通过 joinedload 将查询调整为 return 作者并使用 .distinct() 处理重复数据删除，我们不仅将上面的所有内容简化为几行，而且操作在查询 return 秒后秒内完成。

    for popular_author in db.session.query(Author).join(Book, Author.maintitles).options(db.joinedload(Book, Artist.maintitles)).filter(Book.popularity > 0).distinct().all():
        popular_author.score = sum(book.score for book in popular_author.maintitles)

但是我仍然不完全确定这种方法如何比旧版本快几个数量级。两者都以相同的方式遍历 authors 列表并执行相同的简单求和操作。

作为参考，在此过程之后提交会话大约需要 2:00 小时，而之前的实施速度要快得多。总体上仍然是一个显着的 (7.5 倍) 改进。我的猜测是，从一开始就使用更优化的 query，所有 ORM 对象 returned 都放在 RAM 中，并且可以更快地进行操作。在 query 上引入 python list 方法似乎会破坏它/将内存中的 ORM 碎片化。

列表理解可以帮助迭代 sqlalchemy 查询 returns 吗？

Can list comprehension assist in iterating through sqlalchemy query returns?

python

sqlalchemy

list-comprehension

nested-loops

flask-sqlalchemy