使用 psycopg2 缓存中间体 table
Caching intermediate table with psycopg2
以涉及两个 SELECT
s 的 psycopg2 调用为例:
import psycopg2
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("SELECT a, b, c FROM table WHERE a > 5 and d < 10;")
r1 = cur.fetchall()
cur.execute("SELECT a, b, c FROM table WHERE a > 5 and d > 20;")
r2 = cur.fetchall()
这有点低效;潜在的 O(N) 检查 WHERE a > 5
执行了两次,而它似乎只能执行一次,并对该中间结果执行子查询。
通过 psycopg2 API 执行此操作的规范方法是什么?
类似于:
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("SELECT a, b, c FROM table WHERE a > 5")
# ...
cur.execute("SELECT a, b, c FROM temp_table WHERE d < 10;")
r1 = cur.fetchall()
cur.execute("SELECT a, b, c FROM temp_table WHERE d > 20;")
r2 = cur.fetchall()
使用文字是最好的解决方案吗"CREATE TEMP TABLE..."
?
我是从 Django ORM 的角度来看这个问题的,其中 QuerySet 的后续评估会重用缓存的结果。 psycopg2 API 是否提供类似的东西?
您可以执行单个查询并将结果分成两个列表:
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("SELECT a, b, c, d FROM my_table WHERE a > 5 and (d < 10 or d > 20);")
rows = cur.fetchall()
r1 = [(i[0], i[1], i[2]) for i in rows if i[3] < 10]
r2 = [(i[0], i[1], i[2]) for i in rows if i[3] > 20]
在结果集不是很大的情况下,上述解决方案应该是最有效的。或者,您可以创建一个临时 table:
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("""
CREATE TEMP TABLE t AS
SELECT a, b, c, d
FROM my_table
WHERE a > 5 and (d < 10 or d > 20);""")
cur.execute("SELECT a, b, c FROM t WHERE d < 10;")
r1 = cur.fetchall()
cur.execute("SELECT a, b, c FROM t WHERE d > 20;")
r2 = cur.fetchall()
临时文件table会在连接关闭时自动删除。
如果结果集太大而无法在客户端实际处理,请使用 server-side cursor. When you fetch single rows in a loop the rows are actually retrieved from the server in buckets. You can define the size of the buckets by setting itersize
.。
r1 = []
r2 = []
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor('my_cursor') as cur:
cur.itersize = 1000
cur.execute("SELECT a, b, c, d FROM my_table WHERE a < 5 and (d < 10 or d > 20);")
for row in cur:
if row[3] < 10:
r1.append((row[0], row[1], row[2]))
else:
r2.append((row[0], row[1], row[2]))
以涉及两个 SELECT
s 的 psycopg2 调用为例:
import psycopg2
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("SELECT a, b, c FROM table WHERE a > 5 and d < 10;")
r1 = cur.fetchall()
cur.execute("SELECT a, b, c FROM table WHERE a > 5 and d > 20;")
r2 = cur.fetchall()
这有点低效;潜在的 O(N) 检查 WHERE a > 5
执行了两次,而它似乎只能执行一次,并对该中间结果执行子查询。
通过 psycopg2 API 执行此操作的规范方法是什么?
类似于:
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("SELECT a, b, c FROM table WHERE a > 5")
# ...
cur.execute("SELECT a, b, c FROM temp_table WHERE d < 10;")
r1 = cur.fetchall()
cur.execute("SELECT a, b, c FROM temp_table WHERE d > 20;")
r2 = cur.fetchall()
使用文字是最好的解决方案吗"CREATE TEMP TABLE..."
?
我是从 Django ORM 的角度来看这个问题的,其中 QuerySet 的后续评估会重用缓存的结果。 psycopg2 API 是否提供类似的东西?
您可以执行单个查询并将结果分成两个列表:
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("SELECT a, b, c, d FROM my_table WHERE a > 5 and (d < 10 or d > 20);")
rows = cur.fetchall()
r1 = [(i[0], i[1], i[2]) for i in rows if i[3] < 10]
r2 = [(i[0], i[1], i[2]) for i in rows if i[3] > 20]
在结果集不是很大的情况下,上述解决方案应该是最有效的。或者,您可以创建一个临时 table:
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor() as cur:
cur.execute("""
CREATE TEMP TABLE t AS
SELECT a, b, c, d
FROM my_table
WHERE a > 5 and (d < 10 or d > 20);""")
cur.execute("SELECT a, b, c FROM t WHERE d < 10;")
r1 = cur.fetchall()
cur.execute("SELECT a, b, c FROM t WHERE d > 20;")
r2 = cur.fetchall()
临时文件table会在连接关闭时自动删除。
如果结果集太大而无法在客户端实际处理,请使用 server-side cursor. When you fetch single rows in a loop the rows are actually retrieved from the server in buckets. You can define the size of the buckets by setting itersize
.。
r1 = []
r2 = []
with psycopg2.connect("dbname=test user=postgres") as conn:
with conn.cursor('my_cursor') as cur:
cur.itersize = 1000
cur.execute("SELECT a, b, c, d FROM my_table WHERE a < 5 and (d < 10 or d > 20);")
for row in cur:
if row[3] < 10:
r1.append((row[0], row[1], row[2]))
else:
r2.append((row[0], row[1], row[2]))