如何使用 Redshift 对未来日期进行预测
How to make projections with future dates using Redshift
我目前有一个名为 quantities 的 table,其中包含以下数据:
+------+----------+----------+
| item | end_date | quantity |
+------+----------+----------+
| 1 | 26/11/17 | 100 |
+------+----------+----------+
| 2 | 28/11/17 | 300 |
+------+----------+----------+
| 3 | 30/11/17 | 500 |
+------+----------+----------+
我想查询它以便得到这个结果:
+--------+-------+
| date | total |
+--------+-------+
| 26-Nov | 900 |
+--------+-------+
| 27-Nov | 800 |
+--------+-------+
| 28-Nov | 800 |
+--------+-------+
| 29-Nov | 500 |
+--------+-------+
| 30-Nov | 500 |
+--------+-------+
这个查询会是什么样子?非常感谢!
请注意:
以上输出是在今天的日期=
+--------+-----+-----+-----+-------+
| date | 1 | 2 | 3 | total |
+--------+-----+-----+-----+-------+
| 26-Nov | 100 | 300 | 500 | 900 |
+--------+-----+-----+-----+-------+
| 27-Nov | - | 300 | 500 | 800 |
+--------+-----+-----+-----+-------+
| 28-Nov | - | 300 | 500 | 800 |
+--------+-----+-----+-----+-------+
| 29-Nov | - | - | 500 | 500 |
+--------+-----+-----+-----+-------+
| 30-Nov | - | - | 500 | 500 |
+--------+-----+-----+-----+-------+
我需要查询始终提供具有以下日期范围的输出:
- start_date=current_date
- end_date=项目列表中的最新 end_date
我在 i686-pc-linux-gnu 上使用 PostgreSQL 8.0.2,由 GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)、Redshift 1.0 编译.1499
t=# create table so (item serial, end_date date, quantity int);
CREATE TABLE
t=# set datestyle TO DMY;
SET
t=# insert into so(end_date,quantity) values('26-11-2017',100),('28-11-2017', 300), ('30-11-2017', 500);
INSERT 0 3
然后,我们使用带有顺序 id 列的 table 创建一个包含从今天开始进入未来的日期系列的视图:
CREATE VIEW future_dates AS
SELECT (getdate()::date -1 + id)::date as future_dates
FROM app_data.table_with_sequential_id
ORDER BY future_dates
N.b. 我们需要创建上面的视图,因为 generate_series 在 Redshift 中不完全支持。
最后 select 本身:
t=# select gs::date, sum(quantity) over (order by gs desc rows unbounded preceding)
from future_dates gs
left outer join so on so.end_date = gs
order by gs ;
gs | sum
------------+-----
2017-11-26 | 900
2017-11-27 | 800
2017-11-28 | 800
2017-11-29 | 500
2017-11-30 | 500
(5 rows)
我目前有一个名为 quantities 的 table,其中包含以下数据:
+------+----------+----------+
| item | end_date | quantity |
+------+----------+----------+
| 1 | 26/11/17 | 100 |
+------+----------+----------+
| 2 | 28/11/17 | 300 |
+------+----------+----------+
| 3 | 30/11/17 | 500 |
+------+----------+----------+
我想查询它以便得到这个结果:
+--------+-------+
| date | total |
+--------+-------+
| 26-Nov | 900 |
+--------+-------+
| 27-Nov | 800 |
+--------+-------+
| 28-Nov | 800 |
+--------+-------+
| 29-Nov | 500 |
+--------+-------+
| 30-Nov | 500 |
+--------+-------+
这个查询会是什么样子?非常感谢!
请注意:
以上输出是在今天的日期= 我需要查询始终提供具有以下日期范围的输出: 我在 i686-pc-linux-gnu 上使用 PostgreSQL 8.0.2,由 GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)、Redshift 1.0 编译.1499+--------+-----+-----+-----+-------+
| date | 1 | 2 | 3 | total |
+--------+-----+-----+-----+-------+
| 26-Nov | 100 | 300 | 500 | 900 |
+--------+-----+-----+-----+-------+
| 27-Nov | - | 300 | 500 | 800 |
+--------+-----+-----+-----+-------+
| 28-Nov | - | 300 | 500 | 800 |
+--------+-----+-----+-----+-------+
| 29-Nov | - | - | 500 | 500 |
+--------+-----+-----+-----+-------+
| 30-Nov | - | - | 500 | 500 |
+--------+-----+-----+-----+-------+
t=# create table so (item serial, end_date date, quantity int);
CREATE TABLE
t=# set datestyle TO DMY;
SET
t=# insert into so(end_date,quantity) values('26-11-2017',100),('28-11-2017', 300), ('30-11-2017', 500);
INSERT 0 3
然后,我们使用带有顺序 id 列的 table 创建一个包含从今天开始进入未来的日期系列的视图:
CREATE VIEW future_dates AS
SELECT (getdate()::date -1 + id)::date as future_dates
FROM app_data.table_with_sequential_id
ORDER BY future_dates
N.b. 我们需要创建上面的视图,因为 generate_series 在 Redshift 中不完全支持。
最后 select 本身:
t=# select gs::date, sum(quantity) over (order by gs desc rows unbounded preceding)
from future_dates gs
left outer join so on so.end_date = gs
order by gs ;
gs | sum
------------+-----
2017-11-26 | 900
2017-11-27 | 800
2017-11-28 | 800
2017-11-29 | 500
2017-11-30 | 500
(5 rows)