组内级联的Postgres交叉表(文本,文本)

Postgres crosstab (text, text) cascading within group

Table 架构

DROP TABLE bla;
CREATE TABLE bla (id INTEGER, city INTEGER, year_ INTEGER, month_ INTEGER, val INTEGER);

数据

INSERT INTO bla VALUES(1, 1, 2017, 1, 10);
INSERT INTO bla VALUES(2, 1, 2017, 2, 20);
INSERT INTO bla VALUES(3, 1, 2017, 1, 15);
INSERT INTO bla VALUES(4, 1, 2017, 2, 5);
INSERT INTO bla VALUES(5, 2, 2017, 1, 10);
INSERT INTO bla VALUES(6, 2, 2017, 2, 15);
INSERT INTO bla VALUES(7, 1, 2018, 1, 10);
INSERT INTO bla VALUES(8, 1, 2018, 1, 10);

我正在尝试将它们汇总并放入数据透视表 table 格式中,这样对于每个 (city, year_) 组合,我都会有相应的总数 val。以下是我从在线资源和官方文档中得出的结论。

SELECT * FROM crosstab (
  'SELECT city, year_, month_, SUM(val) FROM bla GROUP BY 1, 2, 3 ORDER BY 1',
  'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
  city INTEGER,
  year_ INTEGER,
  january INTEGER,
  February INTEGER
);

这是我现在得到的输出。

请注意与组 (city 1, year_ 2018) 对应的条目是如何丢失的。我还没有找到任何解决方案,并且认为交叉表可能不支持这种级联结构。

我知道我可以创建一个临时变量 (city_year_) 来绕过这个问题。

SELECT * FROM crosstab (
  'SELECT CONCAT(city, year_)::text AS tag, month_, SUM(val) FROM bla GROUP BY 1, 2 ORDER BY 1',
  'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
  tag text,
  january INTEGER,
  February INTEGER
);

这里输出。

但是 cityyear_ 在各自的列中是我的首选格式(它在视觉上更加丰富并保留了原始数据 - 将 tag 变量拆分为 cityyear_ 需要知道 tag 是如何定义的)。

任何 work-around/help 都非常感谢。问候。

Postgres 的 crosstab() 期望源查询具有特定格式。

This statement [source sql] must return one row_name column, one category column, and one value column. It may also have one or more "extra" columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as "extra". The "extra" columns are expected to be the same for all rows with the same row_name value.

这里的问题是 year_month_ 都是 row_name 列,而 crosstab() 只允许一个 row_name 列。因此,我们必须使用其他东西作为 row_name 列。让我们使用这个函数 dense_rank()

试试这个。

SELECT year_, city, january, february FROM crosstab (
  'SELECT dense_rank() OVER (ORDER BY year_, city)::int AS row_name, 
   year_, city , month_, SUM(val) FROM bla GROUP BY city, year_, month_ 
   ORDER BY 1',
   'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
  rowname integer, 
  year_ integer ,
  city integer,
  january INTEGER,
  february INTEGER
);

这会产生所需的输出:

-------------------------------------
| year_ | city | january | february |
-------------------------------------
| 2017  | 1    | 25      | 25       |
-------------------------------------
| 2017  | 2    | 10      | 15       |
-------------------------------------
| 2018  | 1    | 20      |          |
-------------------------------------