postgresql

Question

我有一个 table 我在本地创建的，用于在具有大约 400 万行（最初是文本文件）的数据集上使用 PG 的一些 window 函数。每行对应一个客户订单。

CREATE TABLE orders
(
  orderid integer,
  customerid integer,
  orderdate date,
  status text,
  amount money,
  tax money,
  customername text,
  customerstate text

我在 i7 8gb RAM Windows 8 机器上本地有数据库运行ning。我在 orderid、customerid 和 orderdate 上有 btree 索引（索引？）。

当我运行下面的查询时，需要300秒（appx）。我希望通过一些基本的调整可以将时间缩短到一分钟，但我不是 DBA。有人有提示吗？

select orderid, customername, orderdate, 
rank() OVER (PARTITION BY customername ORDER BY orderdate ASC) as cust_ord_nbr
from orders

Answer 1

覆盖指数

按 customerid 分区，如。 integer 更小，排序更便宜。如果结果中不需要 customername，请将其完全替换为 customerid。

多列索引可以提供帮助（如）。如果它是（大部分）只读 table，允许仅索引扫描的 "covering" 索引会更快 - 特别是如果包含的列很小：

CREATE INDEX orders_nbr_idx ON orders (customerid, orderdate, orderid);

将 orderid 添加到索引只有在您从中进行仅索引扫描时才有意义。如果您需要 customername，也添加它。更多：

如果它（大部分）是只读的 table，请执行一次昂贵的查询并将快照另存为 MATERIALIZED VIEW 以供重用 ...

花生

您可以做一些小事来减少内存占用。在 playing column tetris 之后，这将为当前因填充而丢失的每行节省 0-7 个字节：

CREATE TABLE orders (
  orderid integer,
  customerid integer,
  amount money,
  tax money,
  orderdate date,
  status text,
  customername text,
  customerstate text
  );

如果您将结果写入另一个 table（或 MATERIALIZED VIEW），它会节省一点以类似的方式优化查询。 rank() 生成 bigint，通过强制转换为 int，每行可节省 8 个字节（4 + 4 填充）：

SELECT orderid, customername, orderdate
    -- orderid, customerid, orderdate  -- good enough?
     , rank() OVER (PARTITION BY customerid
                    ORDER BY orderdate)::int AS cust_ord_nbr
FROM   orders;

postgresql - 我的索引或列类型是否减慢了我的查询速度？

postgresql - are my indexes or column types slowing down my query?

postgresql-performance

覆盖指数

花生