大数据集中行中的列（PostgreSQL）——转置？

Question

我正在尝试重组我的大数据集，以便我可以更轻松地处理我的数据。我有大约 20 个 table 具有与显示的输入 table 相同的数据结构。从1996年到2015年每年都有一个。

这是我的输入之一 tables (mytable2015)

cell   day1      day2      day3      day4    ......   day365
1      3,7167    0         0         0,1487  ......   0,3256
2      0         0         0,2331    0,1461  ......   1,8765
3      1,431     0,4121    0         1,4321  ......   0
...
...
...
64800

我想将所有数据放在一个大数据集中，如果可能的话，第一天、第二天……用实际日期值代替（例如 01.01.2015 或 20150101）所以我的结果应该是这样的：

cell   date      value
1      20150101  3,7167
1      20150102  0
1      20150103  0
1      20150104  0,1487
...    ........  ......
...    ........  ......
...    ........  ......
2      20150101  0
2      20150102  0,4321
...    ........  ......
...    ........  ......
...    ........  ......
64800  20150101  0,1035

单元格代表一个地理信息。它们代表一个在世界各地生成的网格，每个单元格刚好高一度，长一度。

我有两个主要问题：

是否可以将第 1 天、第 2 天...等转换为日期格式？
如何将我的 table 转换成这个新结构？

非常感谢任何帮助，提前致谢！

Answer 1

查询

示例数据：

create table example2015 (cell int, day1 real, day2 real, day3 real, day4 real);
insert into example2015 values
(1,      3.7167,    0,         0,         0.1487),  
(2,      0,         0,         0.2331,    0.1461),  
(3,      1.431,     0.4121,    0,         1.4321);

如何逐步构建查询。

第 1 步。使用 json_each(row_to_json(t)) 来 聚合和取消嵌套 列：

select cell, json_each_text(row_to_json(t)) val
from example2015 t

 cell |      val      
------+---------------
    1 | (cell,1)
    1 | (day1,3.7167)
    1 | (day2,0)
    1 | (day3,0)
    1 | (day4,0.1487)
    2 | (cell,2)
    2 | (day1,0)
    2 | (day2,0)
    2 | (day3,0.2331)
    2 | (day4,0.1461)
    3 | (cell,3)
    3 | (day1,1.431)
    3 | (day2,0.4121)
    3 | (day3,0)
    3 | (day4,1.4321)
(15 rows)

第 2 步。跳过 cell 对，将 dayn 转换为整数 n 并添加到 基准日期 （此处 2014-12-31):

select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from example2015 t
    ) sub
where (val).key <> 'cell'

 cell |    date    | value  
------+------------+--------
    1 | 2015-01-01 | 3.7167
    1 | 2015-01-02 |      0
    1 | 2015-01-03 |      0
    1 | 2015-01-04 | 0.1487
    2 | 2015-01-01 |      0
    2 | 2015-01-02 |      0
    2 | 2015-01-03 | 0.2331
    2 | 2015-01-04 | 0.1461
    3 | 2015-01-01 |  1.431
    3 | 2015-01-02 | 0.4121
    3 | 2015-01-03 |      0
    3 | 2015-01-04 | 1.4321
(12 rows)

转化

您可以使用第 2 步中的查询来插入从 mytable2015 到 result_table 的值：

create table result_table (
    "cell" integer,
    "date" date,
    "value" real
);

您将生成一个包含 23,652,000 行的 table。一次性转换很可能会耗尽内存资源，并且可能需要比您能接受的时间更长的时间。我建议将操作分为几个阶段，比方说，一次最多 10,000 个源行（3,650,000 个新行）。

insert into result_table
select cell, '2014-12-31'::date+ ltrim((val).key, 'day')::int "date", (val).value::real
from (
    select cell, json_each_text(row_to_json(t)) val
    from mytable2015 t
    ) sub
where (val).key <> 'cell'
and cell > 0 and cell <= 10000

重复插入 cell > 10000 and cell <= 20000 等等。

Answer 2

如果 table 和列名一致，您应该能够通过日期算法确定每个最后一行的日期，只需要每个 table 的日期文字，例如'2011-01- 01' table 我的table2011

大多数“unpivot”操作是使用 JSON 执行的，首先将每个源行放入 JSON，然后从中创建行分阶段显示如下。

SQL Fiddle

PostgreSQL 9.3 架构设置:

CREATE TABLE MyTable2011
    ("cell" int, "day1" numeric, "day2" numeric, "day3" int, "day4" numeric, "day365" int)
//

INSERT INTO MyTable2011
    ("cell", "day1", "day2", "day3", "day4", "day365")
VALUES
    (1, 3.7167, 0.00, 0.00, 0.1487, 0.3256),
    (2, 0, 0, 0.2331, 0.1461, 1.8765),
    (3, 1.431, 0.4121, 0, 1.4321, 0.00)
//

查询 1:

SELECT row_to_json(MyTable2011) as jstring FROM MyTable2011

Results:

|                                                                 jstring |
|-------------------------------------------------------------------------|
|  {"cell":1,"day1":3.7167,"day2":0.00,"day3":0,"day4":0.1487,"day365":0} |
|          {"cell":2,"day1":0,"day2":0,"day3":0,"day4":0.1461,"day365":2} |
| {"cell":3,"day1":1.431,"day2":0.4121,"day3":0,"day4":1.4321,"day365":0} |

查询 2:

SELECT
      jstring->>'cell' as cell
    , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows

Results:

| cell |         pairs |
|------|---------------|
|    1 |      (cell,1) |
|    1 | (day1,3.7167) |
|    1 |   (day2,0.00) |
|    1 |      (day3,0) |
|    1 | (day4,0.1487) |
|    1 |    (day365,0) |
|    2 |      (cell,2) |
|    2 |      (day1,0) |
|    2 |      (day2,0) |
|    2 |      (day3,0) |
|    2 | (day4,0.1461) |
|    2 |    (day365,2) |
|    3 |      (cell,3) |
|    3 |  (day1,1.431) |
|    3 | (day2,0.4121) |
|    3 |      (day3,0) |
|    3 | (day4,1.4321) |
|    3 |    (day365,0) |

查询 3:

SELECT
      date '2011-01-01' + CAST(REPLACE((pairs).key,'day','') as integer) -1 as thedate
    , CAST(REPLACE((pairs).key,'day','') as integer) as daynum
    , cell
    , (pairs).value as thevalue 
FROM (
      SELECT
            jstring->>'cell' as cell
          , json_each_text(jstring) as pairs
     FROM (
           SELECT
                row_to_json(MyTable2011) as jstring 
           FROM MyTable2011
          ) as jrows
     ) as unpiv
WHERE (pairs).key <> 'cell'

Results:

|                    thedate | daynum | cell | thevalue |
|----------------------------|--------|------|----------|
|  January, 01 2011 00:00:00 |      1 |    1 |   3.7167 |
|  January, 02 2011 00:00:00 |      2 |    1 |     0.00 |
|  January, 03 2011 00:00:00 |      3 |    1 |        0 |
|  January, 04 2011 00:00:00 |      4 |    1 |   0.1487 |
| December, 31 2011 00:00:00 |    365 |    1 |        0 |
|  January, 01 2011 00:00:00 |      1 |    2 |        0 |
|  January, 02 2011 00:00:00 |      2 |    2 |        0 |
|  January, 03 2011 00:00:00 |      3 |    2 |        0 |
|  January, 04 2011 00:00:00 |      4 |    2 |   0.1461 |
| December, 31 2011 00:00:00 |    365 |    2 |        2 |
|  January, 01 2011 00:00:00 |      1 |    3 |    1.431 |
|  January, 02 2011 00:00:00 |      2 |    3 |   0.4121 |
|  January, 03 2011 00:00:00 |      3 |    3 |        0 |
|  January, 04 2011 00:00:00 |      4 |    3 |   1.4321 |
| December, 31 2011 00:00:00 |    365 |    3 |        0 |

大数据集中行中的列（PostgreSQL）——转置？

Columns in rows in big datasets (PostgreSQL) --Transponse?

postgresql

crosstab

bigdata

data-structures

查询

转化