使用多列组标识符获取最新的预测数据
Get latest forecast data with multi-column group identifier
我有很多具有风力预报数据的天气位置。我需要前一天 10:00
之前的最新 as_of
。我需要为每个时间、每天、每个位置提供此信息。
位置被定义为唯一的 lat
和 lon
对。
具有相关示例数据的完整 table 架构:
CREATE SCHEMA weather
CREATE TABLE weather.forecast
(
foretime timestamp without time zone NOT NULL,
as_of timestamp without time zone NOT NULL, -- in UTC
summary text,
precipintensity numeric(8,4),
precipprob numeric(2,2),
temperature numeric(5,2),
apptemp numeric(5,2),
dewpoint numeric(5,2),
humidity numeric(2,2),
windspeed numeric(5,2),
windbearing numeric(4,1),
visibility numeric(5,2),
cloudcover numeric(4,2),
pressure numeric(6,2),
ozone numeric(5,2),
preciptype text,
lat numeric(8,6) NOT NULL,
lon numeric(9,6) NOT NULL,
CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon)
);
INSERT INTO weather.forecast
(windspeed, foretime, as_of, lat, lon)
VALUES
(11.19, '2/1/2016 8:00', '1/30/2016 23:00', 34.556, 28.345),
(10.98, '2/1/2016 8:00', '1/31/2016 5:00', 34.556, 28.345),
(10.64, '2/1/2016 8:00', '1/31/2016 11:00', 34.556, 28.345),
(10.95, '2/1/2016 8:00', '1/31/2016 8:00', 29.114, 16.277),
(10.39, '2/1/2016 8:00', '1/31/2016 23:00', 29.114, 16.277),
(9.22, '2/1/2016 8:00', '1/31/2016 5:00', 29.114, 16.277),
(10, '2/1/2016 9:00', '1/30/2016 04:00', 34.556, 28.345),
(9.88, '2/1/2016 9:00', '1/31/2016 09:00', 34.556, 28.345),
(10.79, '2/1/2016 9:00', '1/30/2016 23:00', 34.556, 28.345),
(10.8, '2/1/2016 9:00', '1/31/2016 5:00', 29.114, 16.277),
(10.35, '2/1/2016 9:00', '1/31/2016 11:00', 29.114, 16.277),
(10.07, '2/1/2016 9:00', '1/31/2016 17:00', 29.114, 16.277)
;
期望的结果格式:
lat lon Foredate foreHE windspeed as_of
34.556 28.345 2/1/2016 8 10.98 1/31/2016 5:00
34.556 28.345 2/1/2016 9 9.88 1/31/2016 9:00
29.114 16.277 2/1/2016 8 10.95 1/31/2016 8:00
29.114 16.277 2/1/2016 9 10.80 1/31/2016 5:00
这是我获得正确 as_of
的代码。当我尝试增加风速时,事情对我不利。
SELECT
date_trunc('day', (a.foretime)) :: DATE AS Foredate,
extract(HOUR FROM (a.foretime)) AS foreHE,
a.lat,
a.lon,
max(a.as_of) - interval '5 hours' as latest_as_of
FROM weather.forecast a
WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '14 hours'
GROUP BY Foredate, foreHE, a.lat, a.lon
你的错误,当加回风速时,是这样的:
[42803] ERROR: column "a.windspeed" must appear in the GROUP BY clause or be used in an aggregate function
Position: 184
我无法真正改进 PostgreSQL 的错误消息,除非可能稍微深入了解一下理论。基本上,当您执行 GROUP BY
时,您为自己提供了对更大集合中的子集进行操作的奢侈,该集合是查询其余部分所代表的 table。但是 SQL 不允许你迭代那些子集,你必须告诉数据库你的计算并让它给你返回另一个平面列表。
在 Postgres 提供的两个选项中,通常其中一个是显而易见的选择。例如,如果您遗漏了 a.lon
,很明显您没有按经度分组,而只是按纬度分组,您会同意将其添加到 GROUP BY
子句中。但在这种情况下,如果您按实际测量值分组,每个子集将只有一行,这也没有用。所以乍一看,您似乎需要一个聚合。第二个问题是这个问题没有适合你的聚合。叹息!
所以这是我的想法。您需要查找的主键是 (forename, as_of, lat, lon),您可以通过以下查询直接获得:
select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
group by foretime, lat, lon;
现在您可以将其加入到相同的 table、forecast
以获得最新的预测:
select
date_trunc('day', a.foretime)::date as forecast_day,
extract(hour from a.foretime) as forecast_hour,
a.lat, a.lon,
f.windspeed,
a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
group by foretime, lat, lon) a using (foretime, as_of, lat, lon);
这会生成以下报告:
forecast_day | forecast_hour | lat | lon | windspeed | latest_as_of
--------------+---------------+-----------+-----------+-----------+---------------------
2016-02-01 | 8 | 34.556000 | 28.345000 | 10.64 | 2016-01-31 06:00:00
2016-02-01 | 8 | 29.114000 | 16.277000 | 10.39 | 2016-01-31 18:00:00
2016-02-01 | 9 | 34.556000 | 28.345000 | 9.88 | 2016-01-31 04:00:00
2016-02-01 | 9 | 29.114000 | 16.277000 | 10.07 | 2016-01-31 12:00:00
(4 rows)
可能有一种更有效的方法可以对相关子查询执行此操作,但我不确定如何完成它。
编辑:匹配您的输出格式:
select
a.lat, a.lon,
date_trunc('day', a.foretime)::date as forecast_day,
extract(hour from a.foretime) as forecast_hour,
f.windspeed,
a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
where date_trunc('day', foretime)::date - as_of >= interval '14 hours'
group by foretime, lat, lon) a using (foretime, as_of, lat, lon)
order by lat desc, lon;
结果:
lat | lon | forecast_day | forecast_hour | windspeed | latest_as_of
-----------+-----------+--------------+---------------+-----------+---------------------
34.556000 | 28.345000 | 2016-02-01 | 8 | 10.98 | 2016-01-31 00:00:00
34.556000 | 28.345000 | 2016-02-01 | 9 | 9.88 | 2016-01-31 04:00:00
29.114000 | 16.277000 | 2016-02-01 | 8 | 10.95 | 2016-01-31 03:00:00
29.114000 | 16.277000 | 2016-02-01 | 9 | 10.80 | 2016-01-31 00:00:00
(4 rows)
我有很多具有风力预报数据的天气位置。我需要前一天 10:00
之前的最新 as_of
。我需要为每个时间、每天、每个位置提供此信息。
位置被定义为唯一的 lat
和 lon
对。
具有相关示例数据的完整 table 架构:
CREATE SCHEMA weather
CREATE TABLE weather.forecast
(
foretime timestamp without time zone NOT NULL,
as_of timestamp without time zone NOT NULL, -- in UTC
summary text,
precipintensity numeric(8,4),
precipprob numeric(2,2),
temperature numeric(5,2),
apptemp numeric(5,2),
dewpoint numeric(5,2),
humidity numeric(2,2),
windspeed numeric(5,2),
windbearing numeric(4,1),
visibility numeric(5,2),
cloudcover numeric(4,2),
pressure numeric(6,2),
ozone numeric(5,2),
preciptype text,
lat numeric(8,6) NOT NULL,
lon numeric(9,6) NOT NULL,
CONSTRAINT forecast_pkey PRIMARY KEY (foretime, as_of, lat, lon)
);
INSERT INTO weather.forecast
(windspeed, foretime, as_of, lat, lon)
VALUES
(11.19, '2/1/2016 8:00', '1/30/2016 23:00', 34.556, 28.345),
(10.98, '2/1/2016 8:00', '1/31/2016 5:00', 34.556, 28.345),
(10.64, '2/1/2016 8:00', '1/31/2016 11:00', 34.556, 28.345),
(10.95, '2/1/2016 8:00', '1/31/2016 8:00', 29.114, 16.277),
(10.39, '2/1/2016 8:00', '1/31/2016 23:00', 29.114, 16.277),
(9.22, '2/1/2016 8:00', '1/31/2016 5:00', 29.114, 16.277),
(10, '2/1/2016 9:00', '1/30/2016 04:00', 34.556, 28.345),
(9.88, '2/1/2016 9:00', '1/31/2016 09:00', 34.556, 28.345),
(10.79, '2/1/2016 9:00', '1/30/2016 23:00', 34.556, 28.345),
(10.8, '2/1/2016 9:00', '1/31/2016 5:00', 29.114, 16.277),
(10.35, '2/1/2016 9:00', '1/31/2016 11:00', 29.114, 16.277),
(10.07, '2/1/2016 9:00', '1/31/2016 17:00', 29.114, 16.277)
;
期望的结果格式:
lat lon Foredate foreHE windspeed as_of
34.556 28.345 2/1/2016 8 10.98 1/31/2016 5:00
34.556 28.345 2/1/2016 9 9.88 1/31/2016 9:00
29.114 16.277 2/1/2016 8 10.95 1/31/2016 8:00
29.114 16.277 2/1/2016 9 10.80 1/31/2016 5:00
这是我获得正确 as_of
的代码。当我尝试增加风速时,事情对我不利。
SELECT
date_trunc('day', (a.foretime)) :: DATE AS Foredate,
extract(HOUR FROM (a.foretime)) AS foreHE,
a.lat,
a.lon,
max(a.as_of) - interval '5 hours' as latest_as_of
FROM weather.forecast a
WHERE date_trunc('day', foretime) :: DATE - as_of >= INTERVAL '14 hours'
GROUP BY Foredate, foreHE, a.lat, a.lon
你的错误,当加回风速时,是这样的:
[42803] ERROR: column "a.windspeed" must appear in the GROUP BY clause or be used in an aggregate function
Position: 184
我无法真正改进 PostgreSQL 的错误消息,除非可能稍微深入了解一下理论。基本上,当您执行 GROUP BY
时,您为自己提供了对更大集合中的子集进行操作的奢侈,该集合是查询其余部分所代表的 table。但是 SQL 不允许你迭代那些子集,你必须告诉数据库你的计算并让它给你返回另一个平面列表。
在 Postgres 提供的两个选项中,通常其中一个是显而易见的选择。例如,如果您遗漏了 a.lon
,很明显您没有按经度分组,而只是按纬度分组,您会同意将其添加到 GROUP BY
子句中。但在这种情况下,如果您按实际测量值分组,每个子集将只有一行,这也没有用。所以乍一看,您似乎需要一个聚合。第二个问题是这个问题没有适合你的聚合。叹息!
所以这是我的想法。您需要查找的主键是 (forename, as_of, lat, lon),您可以通过以下查询直接获得:
select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
group by foretime, lat, lon;
现在您可以将其加入到相同的 table、forecast
以获得最新的预测:
select
date_trunc('day', a.foretime)::date as forecast_day,
extract(hour from a.foretime) as forecast_hour,
a.lat, a.lon,
f.windspeed,
a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
group by foretime, lat, lon) a using (foretime, as_of, lat, lon);
这会生成以下报告:
forecast_day | forecast_hour | lat | lon | windspeed | latest_as_of
--------------+---------------+-----------+-----------+-----------+---------------------
2016-02-01 | 8 | 34.556000 | 28.345000 | 10.64 | 2016-01-31 06:00:00
2016-02-01 | 8 | 29.114000 | 16.277000 | 10.39 | 2016-01-31 18:00:00
2016-02-01 | 9 | 34.556000 | 28.345000 | 9.88 | 2016-01-31 04:00:00
2016-02-01 | 9 | 29.114000 | 16.277000 | 10.07 | 2016-01-31 12:00:00
(4 rows)
可能有一种更有效的方法可以对相关子查询执行此操作,但我不确定如何完成它。
编辑:匹配您的输出格式:
select
a.lat, a.lon,
date_trunc('day', a.foretime)::date as forecast_day,
extract(hour from a.foretime) as forecast_hour,
f.windspeed,
a.as_of - interval '5 hours' as latest_as_of
from weather.forecast f
join (select
foretime,
max(as_of) as as_of,
lat, lon
from weather.forecast
where date_trunc('day', foretime)::date - as_of >= interval '14 hours'
group by foretime, lat, lon) a using (foretime, as_of, lat, lon)
order by lat desc, lon;
结果:
lat | lon | forecast_day | forecast_hour | windspeed | latest_as_of
-----------+-----------+--------------+---------------+-----------+---------------------
34.556000 | 28.345000 | 2016-02-01 | 8 | 10.98 | 2016-01-31 00:00:00
34.556000 | 28.345000 | 2016-02-01 | 9 | 9.88 | 2016-01-31 04:00:00
29.114000 | 16.277000 | 2016-02-01 | 8 | 10.95 | 2016-01-31 03:00:00
29.114000 | 16.277000 | 2016-02-01 | 9 | 10.80 | 2016-01-31 00:00:00
(4 rows)