组内级联的Postgres交叉表(文本,文本)
Postgres crosstab (text, text) cascading within group
Table 架构
DROP TABLE bla;
CREATE TABLE bla (id INTEGER, city INTEGER, year_ INTEGER, month_ INTEGER, val INTEGER);
数据
INSERT INTO bla VALUES(1, 1, 2017, 1, 10);
INSERT INTO bla VALUES(2, 1, 2017, 2, 20);
INSERT INTO bla VALUES(3, 1, 2017, 1, 15);
INSERT INTO bla VALUES(4, 1, 2017, 2, 5);
INSERT INTO bla VALUES(5, 2, 2017, 1, 10);
INSERT INTO bla VALUES(6, 2, 2017, 2, 15);
INSERT INTO bla VALUES(7, 1, 2018, 1, 10);
INSERT INTO bla VALUES(8, 1, 2018, 1, 10);
我正在尝试将它们汇总并放入数据透视表 table 格式中,这样对于每个 (city, year_)
组合,我都会有相应的总数 val
。以下是我从在线资源和官方文档中得出的结论。
SELECT * FROM crosstab (
'SELECT city, year_, month_, SUM(val) FROM bla GROUP BY 1, 2, 3 ORDER BY 1',
'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
city INTEGER,
year_ INTEGER,
january INTEGER,
February INTEGER
);
这是我现在得到的输出。
请注意与组 (city
1, year_
2018) 对应的条目是如何丢失的。我还没有找到任何解决方案,并且认为交叉表可能不支持这种级联结构。
我知道我可以创建一个临时变量 (city_year_
) 来绕过这个问题。
SELECT * FROM crosstab (
'SELECT CONCAT(city, year_)::text AS tag, month_, SUM(val) FROM bla GROUP BY 1, 2 ORDER BY 1',
'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
tag text,
january INTEGER,
February INTEGER
);
这里输出。
但是 city
和 year_
在各自的列中是我的首选格式(它在视觉上更加丰富并保留了原始数据 - 将 tag
变量拆分为 city
和 year_
需要知道 tag
是如何定义的)。
任何 work-around/help 都非常感谢。问候。
Postgres 的 crosstab()
期望源查询具有特定格式。
This statement [source sql] must return one row_name column, one category column, and one value column. It may also have one or more "extra" columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as "extra". The "extra" columns are expected to be the same for all rows with the same row_name value.
这里的问题是 year_
和 month_
都是 row_name
列,而 crosstab()
只允许一个 row_name
列。因此,我们必须使用其他东西作为 row_name
列。让我们使用这个函数 dense_rank()
试试这个。
SELECT year_, city, january, february FROM crosstab (
'SELECT dense_rank() OVER (ORDER BY year_, city)::int AS row_name,
year_, city , month_, SUM(val) FROM bla GROUP BY city, year_, month_
ORDER BY 1',
'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
rowname integer,
year_ integer ,
city integer,
january INTEGER,
february INTEGER
);
这会产生所需的输出:
-------------------------------------
| year_ | city | january | february |
-------------------------------------
| 2017 | 1 | 25 | 25 |
-------------------------------------
| 2017 | 2 | 10 | 15 |
-------------------------------------
| 2018 | 1 | 20 | |
-------------------------------------
Table 架构
DROP TABLE bla;
CREATE TABLE bla (id INTEGER, city INTEGER, year_ INTEGER, month_ INTEGER, val INTEGER);
数据
INSERT INTO bla VALUES(1, 1, 2017, 1, 10);
INSERT INTO bla VALUES(2, 1, 2017, 2, 20);
INSERT INTO bla VALUES(3, 1, 2017, 1, 15);
INSERT INTO bla VALUES(4, 1, 2017, 2, 5);
INSERT INTO bla VALUES(5, 2, 2017, 1, 10);
INSERT INTO bla VALUES(6, 2, 2017, 2, 15);
INSERT INTO bla VALUES(7, 1, 2018, 1, 10);
INSERT INTO bla VALUES(8, 1, 2018, 1, 10);
我正在尝试将它们汇总并放入数据透视表 table 格式中,这样对于每个 (city, year_)
组合,我都会有相应的总数 val
。以下是我从在线资源和官方文档中得出的结论。
SELECT * FROM crosstab (
'SELECT city, year_, month_, SUM(val) FROM bla GROUP BY 1, 2, 3 ORDER BY 1',
'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
city INTEGER,
year_ INTEGER,
january INTEGER,
February INTEGER
);
这是我现在得到的输出。
请注意与组 (city
1, year_
2018) 对应的条目是如何丢失的。我还没有找到任何解决方案,并且认为交叉表可能不支持这种级联结构。
我知道我可以创建一个临时变量 (city_year_
) 来绕过这个问题。
SELECT * FROM crosstab (
'SELECT CONCAT(city, year_)::text AS tag, month_, SUM(val) FROM bla GROUP BY 1, 2 ORDER BY 1',
'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
tag text,
january INTEGER,
February INTEGER
);
这里输出。
但是 city
和 year_
在各自的列中是我的首选格式(它在视觉上更加丰富并保留了原始数据 - 将 tag
变量拆分为 city
和 year_
需要知道 tag
是如何定义的)。
任何 work-around/help 都非常感谢。问候。
Postgres 的 crosstab()
期望源查询具有特定格式。
This statement [source sql] must return one row_name column, one category column, and one value column. It may also have one or more "extra" columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as "extra". The "extra" columns are expected to be the same for all rows with the same row_name value.
这里的问题是 year_
和 month_
都是 row_name
列,而 crosstab()
只允许一个 row_name
列。因此,我们必须使用其他东西作为 row_name
列。让我们使用这个函数 dense_rank()
试试这个。
SELECT year_, city, january, february FROM crosstab (
'SELECT dense_rank() OVER (ORDER BY year_, city)::int AS row_name,
year_, city , month_, SUM(val) FROM bla GROUP BY city, year_, month_
ORDER BY 1',
'SELECT DISTINCT month_ FROM bla ORDER BY 1'
) AS final_table (
rowname integer,
year_ integer ,
city integer,
january INTEGER,
february INTEGER
);
这会产生所需的输出:
-------------------------------------
| year_ | city | january | february |
-------------------------------------
| 2017 | 1 | 25 | 25 |
-------------------------------------
| 2017 | 2 | 10 | 15 |
-------------------------------------
| 2018 | 1 | 20 | |
-------------------------------------