对多个值列进行透视 table

Pivot table with multiple value columns

我有一个 Postgres table,其中包含来自不同制造商的产品数据,这里是简化的 table 结构:

CREATE TABLE test_table (
  sku               text,
  manufacturer_name text,
  price             double precision,
  stock             int
);

INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
       ('sku1', 'Manufacturer2', 120.00, 15),
       ('sku1', 'Manufacturer3', 130.00, 1),
       ('sku1', 'Manufacturer3', 30.00, 11),
       ('sku2', 'Manufacturer1', 10.00, 2),
       ('sku2', 'Manufacturer2', 9.00,  3),
       ('sku3', 'Manufacturer2', 21.00, 3),
       ('sku3', 'Manufacturer2', 1.00, 7),
       ('sku3', 'Manufacturer3', 19.00, 5);

我需要为每个 sku 输出每个制造商,但是如果同一个 sku 有几个相同的制造商,我需要 select 价格最低的制造商(注意我还需要包括 'stock' 列),此处需要的结果:

| sku  | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0      | 22         | 120.0      | 15         | 30.0       | 11         |
| sku2 | 10.0       | 2          | 9.0        | 3          |            |            |
| sku3 |            |            | 1.0        | 7          | 19.0       | 5          |

我尝试使用 Postgres crosstab():

SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
              FROM test_table
              ORDER BY 1,2',
              $$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
       )
       AS ct (sku text, "man1_price" double precision,
              "man2_price" double precision,
              "man3_price" double precision
    );

但这会产生一个只有一个 price 列的 table。而且我没有找到包含 stock 列的方法。

我也试过使用条件聚合:

SELECT sku,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku

并且此查询在我的情况下也不起作用 - 它只是 select 的最低库存水平 - 但如果相同 sku 的相同制造商很少但 prices/stocks -此查询 select 一家制造商的最低价格和另一家制造商的最低库存。

如何从 table 输出每个制造商的 price 和相应的 stock

P.S。谢谢大家提供如此有用的答案。 我的 Postgres table 相当小 - 产品不超过 15k,(我不知道这些数字是否对正确比较有用)但是由于 Erwin B运行dstetter 要求比较不同的查询性能我 运行 3 个查询 EXPLAIN ANALYZE,这是它们的执行时间:

Erwin Brandstetter query:        400 - 450 ms 
Kjetil S query:                  250 - 300 ms
Gordon Linoff query:             200 - 250 ms
a_horse_with_no_name query:      250 - 300 ms

同样 - 我不确定这些数字是否可以用作参考。对于我的案例,我选择了 Kjetil SGordon Linoff 查询的组合版本,但是 Erwin Brandstettera_horse_with_no_name 变体也非常有用和有趣。 值得注意的是,如果我的 table 将来最终会有更多的制造商 - 每次调整查询并输入他们的名字会很烦人 - 因此来自 a_horse_with_no_name 答案的查询将是最多的使用方便

你的最后一个 select 几乎 有效。但是您应该添加一个 where 条件,其中每个制造商每个 sku 的非最低价格的行被删除。这会产生您预期的结果:

select
  sku,
  min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
  min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
  min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
  min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
  min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
  min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
    select 1 from test_table
    where sku=t.sku
    and manufacturer_name=t.manufacturer_name
    and price<t.price
)
group by sku
order by 1;

我发现现在使用 JSON 结果比使用复杂的主元要容易得多。生成单个聚合 JSON 值不会打破 SQL 的固有限制,即在执行查询之前必须知道列数(并且所有行必须相同)。

你可以使用这样的东西:

select sku, 
       jsonb_object_agg(manufacturer_name, 
                          jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
  select sku, 
         manufacturer_name,
         price, 
         min(price) over (partition by sku) as min_price,
         stock
  from test_table
) t
group by sku;

以上 returns 以下结果使用您的示例数据:

sku  | price_info                                                                                                                                                                                             
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}                                                                       
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}                                                                       

我会使用 distinct on 将数据限制为一个制造商的一个价格。我喜欢 Postgres 中的 filter 功能。所以:

select sku,
       max(price) filter (where manufacturer_name = 'Manufacturer1') as man1_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer1') as man1_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer2') as man2_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer2') as man2_stock,
       max(price) filter (where manufacturer_name = 'Manufacturer3') as man3_price,
       max(stock) filter (where manufacturer_name = 'Manufacturer3') as man3_stock
from (select distinct on (manufacturer_name, sku) t.*
      from test_table t
      order by manufacturer_name, sku, price
     ) t
group by sku
order by sku;

crosstab() 必须提供一个 static 列定义列表。你的第二个参数:

$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$

... 提供了一个 dynamic 值列表,需要 dynamic 列定义列表。那是行不通的 - 除非发生率。

您的任务的核心问题是 crosstab() 在其第一个参数中期望来自查询的 单个 值列。但是您希望每行处理 两个值列pricestock)。

一种解决方法是将多个值打包成 复合类型 并在外部 SELECT.

中提取值

一次创建复合类型:

CREATE TYPE price_stock AS (price float8, stock int);

临时 table 或视图也可以达到目的。
那么:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, (price, stock)::price_stock
    FROM   test_table
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 price_stock
            , man2 price_stock
            , man3 price_stock
    );

为了快速测试,或者如果底层 table 的行不是太宽,您也可以只使用它的行类型,而不创建自定义类型:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, t
    FROM   test_table t
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 test_table
            , man2 test_table
            , man3 test_table
    );

db<>fiddle here

相关:

  • PostgreSQL Crosstab Query