大数据集中行中的列(PostgreSQL)——转置?
Columns in rows in big datasets (PostgreSQL) --Transponse?
我正在尝试重组我的大数据集,以便我可以更轻松地处理我的数据。我有大约 20 个 table 具有与显示的输入 table 相同的数据结构。从1996年到2015年每年都有一个。
这是我的输入之一 tables (mytable2015)
cell day1 day2 day3 day4 ...... day365
1 3,7167 0 0 0,1487 ...... 0,3256
2 0 0 0,2331 0,1461 ...... 1,8765
3 1,431 0,4121 0 1,4321 ...... 0
...
...
...
64800
我想将所有数据放在一个大数据集中,如果可能的话,第一天、第二天……用实际日期值代替(例如 01.01.2015 或 20150101)
所以我的结果应该是这样的:
cell date value
1 20150101 3,7167
1 20150102 0
1 20150103 0
1 20150104 0,1487
... ........ ......
... ........ ......
... ........ ......
2 20150101 0
2 20150102 0,4321
... ........ ......
... ........ ......
... ........ ......
64800 20150101 0,1035
单元格代表一个地理信息。它们代表一个在世界各地生成的网格,每个单元格刚好高一度,长一度。
我有两个主要问题:
是否可以将第 1 天、第 2 天...等转换为日期格式?
如何将我的 table 转换成这个新结构?
非常感谢任何帮助,提前致谢!
查询
示例数据:
create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1, 3.7167, 0, 0, 0.1487),
(2, 0, 0, 0.2331, 0.1461),
(3, 1.431, 0.4121, 0, 1.4321);
如何逐步构建查询。
第 1 步。使用 json_each(row_to_json(t))
来 聚合和取消嵌套 列:
select cell, json_each_text(row_to_json(t)) val
from example2015 t
cell | val
------+---------------
1 | (cell,1)
1 | (day1,3.7167)
1 | (day2,0)
1 | (day3,0)
1 | (day4,0.1487)
2 | (cell,2)
2 | (day1,0)
2 | (day2,0)
2 | (day3,0.2331)
2 | (day4,0.1461)
3 | (cell,3)
3 | (day1,1.431)
3 | (day2,0.4121)
3 | (day3,0)
3 | (day4,1.4321)
(15 rows)
第 2 步。跳过 cell
对,将 dayn
转换为整数 n
并添加到 基准日期 (此处 2014-12-31
):
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
select cell, json_each_text(row_to_json(t)) val
from example2015 t
) sub
where (val).key <> 'cell'
cell | date | value
------+------------+--------
1 | 2015-01-01 | 3.7167
1 | 2015-01-02 | 0
1 | 2015-01-03 | 0
1 | 2015-01-04 | 0.1487
2 | 2015-01-01 | 0
2 | 2015-01-02 | 0
2 | 2015-01-03 | 0.2331
2 | 2015-01-04 | 0.1461
3 | 2015-01-01 | 1.431
3 | 2015-01-02 | 0.4121
3 | 2015-01-03 | 0
3 | 2015-01-04 | 1.4321
(12 rows)
转化
您可以使用第 2 步中的查询来插入从 mytable2015
到 result_table
的值:
create table result_table (
"cell" integer,
"date" date,
"value" real
);
您将生成一个包含 23,652,000 行的 table。
一次性转换很可能会耗尽内存资源,并且可能需要比您能接受的时间更长的时间。
我建议将操作分为几个阶段,比方说,一次最多 10,000 个源行(3,650,000 个新行)。
insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
select cell, json_each_text(row_to_json(t)) val
from mytable2015 t
) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000
重复插入 cell > 10000 and cell <= 20000
等等。
如果 table 和列名一致,您应该能够通过日期算法确定每个最后一行的日期,只需要每个 table 的日期文字,例如'2011-01- 01' table 我的table2011
大多数“unpivot”操作是使用 JSON 执行的,首先将每个源行放入 JSON,然后从中创建行分阶段显示如下。
PostgreSQL 9.3 架构设置:
CREATE TABLE MyTable2011
("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//
INSERT INTO MyTable2011
("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
(1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
(2, 0, 0, 0.2331, 0.1461, 1.8765),
(3, 1.431, 0.4121, 0, 1.4321, 0.00)
//
查询 1:
SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011
| jstring |
|-------------------------------------------------------------------------|
| {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
| {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |
查询 2:
SELECT
jstring->>'cell' as cell
, json_each_text(jstring) as pairs
FROM (
SELECT
row_to_json(MyTable2011) as jstring
FROM MyTable2011
) as jrows
| cell | pairs |
|------|---------------|
| 1 | (cell,1) |
| 1 | (day1,3.7167) |
| 1 | (day2,0.00) |
| 1 | (day3,0) |
| 1 | (day4,0.1487) |
| 1 | (day365,0) |
| 2 | (cell,2) |
| 2 | (day1,0) |
| 2 | (day2,0) |
| 2 | (day3,0) |
| 2 | (day4,0.1461) |
| 2 | (day365,2) |
| 3 | (cell,3) |
| 3 | (day1,1.431) |
| 3 | (day2,0.4121) |
| 3 | (day3,0) |
| 3 | (day4,1.4321) |
| 3 | (day365,0) |
查询 3:
SELECT
date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
, CAST(REPLACE((pairs).key,'day','') as integer) as daynum
, cell
, (pairs).value as thevalue
FROM (
SELECT
jstring->>'cell' as cell
, json_each_text(jstring) as pairs
FROM (
SELECT
row_to_json(MyTable2011) as jstring
FROM MyTable2011
) as jrows
) as unpiv
WHERE (pairs).key <> 'cell'
| thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
| January, 01 2011 00:00:00 | 1 | 1 | 3.7167 |
| January, 02 2011 00:00:00 | 2 | 1 | 0.00 |
| January, 03 2011 00:00:00 | 3 | 1 | 0 |
| January, 04 2011 00:00:00 | 4 | 1 | 0.1487 |
| December, 31 2011 00:00:00 | 365 | 1 | 0 |
| January, 01 2011 00:00:00 | 1 | 2 | 0 |
| January, 02 2011 00:00:00 | 2 | 2 | 0 |
| January, 03 2011 00:00:00 | 3 | 2 | 0 |
| January, 04 2011 00:00:00 | 4 | 2 | 0.1461 |
| December, 31 2011 00:00:00 | 365 | 2 | 2 |
| January, 01 2011 00:00:00 | 1 | 3 | 1.431 |
| January, 02 2011 00:00:00 | 2 | 3 | 0.4121 |
| January, 03 2011 00:00:00 | 3 | 3 | 0 |
| January, 04 2011 00:00:00 | 4 | 3 | 1.4321 |
| December, 31 2011 00:00:00 | 365 | 3 | 0 |
我正在尝试重组我的大数据集,以便我可以更轻松地处理我的数据。我有大约 20 个 table 具有与显示的输入 table 相同的数据结构。从1996年到2015年每年都有一个。
这是我的输入之一 tables (mytable2015)
cell day1 day2 day3 day4 ...... day365
1 3,7167 0 0 0,1487 ...... 0,3256
2 0 0 0,2331 0,1461 ...... 1,8765
3 1,431 0,4121 0 1,4321 ...... 0
...
...
...
64800
我想将所有数据放在一个大数据集中,如果可能的话,第一天、第二天……用实际日期值代替(例如 01.01.2015 或 20150101) 所以我的结果应该是这样的:
cell date value
1 20150101 3,7167
1 20150102 0
1 20150103 0
1 20150104 0,1487
... ........ ......
... ........ ......
... ........ ......
2 20150101 0
2 20150102 0,4321
... ........ ......
... ........ ......
... ........ ......
64800 20150101 0,1035
单元格代表一个地理信息。它们代表一个在世界各地生成的网格,每个单元格刚好高一度,长一度。
我有两个主要问题:
是否可以将第 1 天、第 2 天...等转换为日期格式?
如何将我的 table 转换成这个新结构?
非常感谢任何帮助,提前致谢!
查询
示例数据:
create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1, 3.7167, 0, 0, 0.1487),
(2, 0, 0, 0.2331, 0.1461),
(3, 1.431, 0.4121, 0, 1.4321);
如何逐步构建查询。
第 1 步。使用 json_each(row_to_json(t))
来 聚合和取消嵌套 列:
select cell, json_each_text(row_to_json(t)) val
from example2015 t
cell | val
------+---------------
1 | (cell,1)
1 | (day1,3.7167)
1 | (day2,0)
1 | (day3,0)
1 | (day4,0.1487)
2 | (cell,2)
2 | (day1,0)
2 | (day2,0)
2 | (day3,0.2331)
2 | (day4,0.1461)
3 | (cell,3)
3 | (day1,1.431)
3 | (day2,0.4121)
3 | (day3,0)
3 | (day4,1.4321)
(15 rows)
第 2 步。跳过 cell
对,将 dayn
转换为整数 n
并添加到 基准日期 (此处 2014-12-31
):
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
select cell, json_each_text(row_to_json(t)) val
from example2015 t
) sub
where (val).key <> 'cell'
cell | date | value
------+------------+--------
1 | 2015-01-01 | 3.7167
1 | 2015-01-02 | 0
1 | 2015-01-03 | 0
1 | 2015-01-04 | 0.1487
2 | 2015-01-01 | 0
2 | 2015-01-02 | 0
2 | 2015-01-03 | 0.2331
2 | 2015-01-04 | 0.1461
3 | 2015-01-01 | 1.431
3 | 2015-01-02 | 0.4121
3 | 2015-01-03 | 0
3 | 2015-01-04 | 1.4321
(12 rows)
转化
您可以使用第 2 步中的查询来插入从 mytable2015
到 result_table
的值:
create table result_table (
"cell" integer,
"date" date,
"value" real
);
您将生成一个包含 23,652,000 行的 table。 一次性转换很可能会耗尽内存资源,并且可能需要比您能接受的时间更长的时间。 我建议将操作分为几个阶段,比方说,一次最多 10,000 个源行(3,650,000 个新行)。
insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
select cell, json_each_text(row_to_json(t)) val
from mytable2015 t
) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000
重复插入 cell > 10000 and cell <= 20000
等等。
如果 table 和列名一致,您应该能够通过日期算法确定每个最后一行的日期,只需要每个 table 的日期文字,例如'2011-01- 01' table 我的table2011
大多数“unpivot”操作是使用 JSON 执行的,首先将每个源行放入 JSON,然后从中创建行分阶段显示如下。
PostgreSQL 9.3 架构设置:
CREATE TABLE MyTable2011
("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//
INSERT INTO MyTable2011
("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
(1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
(2, 0, 0, 0.2331, 0.1461, 1.8765),
(3, 1.431, 0.4121, 0, 1.4321, 0.00)
//
查询 1:
SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011
| jstring |
|-------------------------------------------------------------------------|
| {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
| {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |
查询 2:
SELECT
jstring->>'cell' as cell
, json_each_text(jstring) as pairs
FROM (
SELECT
row_to_json(MyTable2011) as jstring
FROM MyTable2011
) as jrows
| cell | pairs |
|------|---------------|
| 1 | (cell,1) |
| 1 | (day1,3.7167) |
| 1 | (day2,0.00) |
| 1 | (day3,0) |
| 1 | (day4,0.1487) |
| 1 | (day365,0) |
| 2 | (cell,2) |
| 2 | (day1,0) |
| 2 | (day2,0) |
| 2 | (day3,0) |
| 2 | (day4,0.1461) |
| 2 | (day365,2) |
| 3 | (cell,3) |
| 3 | (day1,1.431) |
| 3 | (day2,0.4121) |
| 3 | (day3,0) |
| 3 | (day4,1.4321) |
| 3 | (day365,0) |
查询 3:
SELECT
date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
, CAST(REPLACE((pairs).key,'day','') as integer) as daynum
, cell
, (pairs).value as thevalue
FROM (
SELECT
jstring->>'cell' as cell
, json_each_text(jstring) as pairs
FROM (
SELECT
row_to_json(MyTable2011) as jstring
FROM MyTable2011
) as jrows
) as unpiv
WHERE (pairs).key <> 'cell'
| thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
| January, 01 2011 00:00:00 | 1 | 1 | 3.7167 |
| January, 02 2011 00:00:00 | 2 | 1 | 0.00 |
| January, 03 2011 00:00:00 | 3 | 1 | 0 |
| January, 04 2011 00:00:00 | 4 | 1 | 0.1487 |
| December, 31 2011 00:00:00 | 365 | 1 | 0 |
| January, 01 2011 00:00:00 | 1 | 2 | 0 |
| January, 02 2011 00:00:00 | 2 | 2 | 0 |
| January, 03 2011 00:00:00 | 3 | 2 | 0 |
| January, 04 2011 00:00:00 | 4 | 2 | 0.1461 |
| December, 31 2011 00:00:00 | 365 | 2 | 2 |
| January, 01 2011 00:00:00 | 1 | 3 | 1.431 |
| January, 02 2011 00:00:00 | 2 | 3 | 0.4121 |
| January, 03 2011 00:00:00 | 3 | 3 | 0 |
| January, 04 2011 00:00:00 | 4 | 3 | 1.4321 |
| December, 31 2011 00:00:00 | 365 | 3 | 0 |