运行 按一个字段分组并按另一个字段排序的 Postgres 查询
Run Postgres query that groups by one field and sorts by another
我有一个具有以下相关字段的 PostgreSQL table:
url
title
created_at
可以有很多行包含相同的 URL 但不同的标题。以下是一些示例行:
www.nytimes.com | The New York Times | 2016-01-01 00:00:00`
www.wsj.com | The Wall Street Journal | 2016-01-03 15:32:13`
www.nytimes.com | The New York Times Online | 2016-01-06 07:19:08`
我正在尝试获取列出以下字段的输出:
1) url
2) title
对应最高值 created_at
3) 该唯一 url
的所有 title
计数
因此,上述示例的输出行将如下所示:
www.nytimes.com | The New York Times Online | 2
www.wsj.com | The Wall Street Journal | 1
基于我读过的关于类似问题的大量 SO 帖子,看起来我获取前两个字段(url
和最新的 title
)的最佳选择是使用 DISTINCT ON
:
select distinct on (url) url, title from headlines order by url, created_at desc
同样,要获得第一个和第三个字段(url
和所有 title
的计数),我可以简单地使用 GROUP BY
:
select url, count(title) from headlines group by url
我想不通的是如何结合上述方法并获得我试图获得的 above-mentioned 三个值。
(经过编辑以提供更清晰的内容。)
试试看;
select t1.url, t2.title, t1.cnt
from (
select url, count(title) cnt
from headlines
group by url
) t1
join (
select distinct on (url) url, title
from headlines
order by url, created_at desc
) t2 on t1.url = t2.url
order by t1.url
join
两个查询 url
这可以通过对 table 的单次扫描在单个 SELECT
中完成 - 通过将 window function 与 DISTINCT ON
组合:
SELECT DISTINCT ON (url)
url, title, count(*) OVER (PARTITION BY url) AS ct
FROM headlines
ORDER BY url, created_at DESC NULLS LAST;
相关(有详细解释):
- Best way to get result count before LIMIT was applied
- Select first row in each GROUP BY group?
- PostgreSQL: running count of rows for a query 'by minute'
试试这个:
select t1.url,t1.title,t2.count from headlines t1
inner join(
select url,count(*) as count,max(created_at) as created_at
from headlines group by url ) t2 on t1.url=t2.url and t1.created_at=t2.created_at;
SQL Fiddle: http://sqlfiddle.com/#!15/f7665f/11
我有一个具有以下相关字段的 PostgreSQL table:
url
title
created_at
可以有很多行包含相同的 URL 但不同的标题。以下是一些示例行:
www.nytimes.com | The New York Times | 2016-01-01 00:00:00`
www.wsj.com | The Wall Street Journal | 2016-01-03 15:32:13`
www.nytimes.com | The New York Times Online | 2016-01-06 07:19:08`
我正在尝试获取列出以下字段的输出:
1) url
2) title
对应最高值 created_at
3) 该唯一 url
title
计数
因此,上述示例的输出行将如下所示:
www.nytimes.com | The New York Times Online | 2
www.wsj.com | The Wall Street Journal | 1
基于我读过的关于类似问题的大量 SO 帖子,看起来我获取前两个字段(url
和最新的 title
)的最佳选择是使用 DISTINCT ON
:
select distinct on (url) url, title from headlines order by url, created_at desc
同样,要获得第一个和第三个字段(url
和所有 title
的计数),我可以简单地使用 GROUP BY
:
select url, count(title) from headlines group by url
我想不通的是如何结合上述方法并获得我试图获得的 above-mentioned 三个值。
(经过编辑以提供更清晰的内容。)
试试看;
select t1.url, t2.title, t1.cnt
from (
select url, count(title) cnt
from headlines
group by url
) t1
join (
select distinct on (url) url, title
from headlines
order by url, created_at desc
) t2 on t1.url = t2.url
order by t1.url
join
两个查询 url
这可以通过对 table 的单次扫描在单个 SELECT
中完成 - 通过将 window function 与 DISTINCT ON
组合:
SELECT DISTINCT ON (url)
url, title, count(*) OVER (PARTITION BY url) AS ct
FROM headlines
ORDER BY url, created_at DESC NULLS LAST;
相关(有详细解释):
- Best way to get result count before LIMIT was applied
- Select first row in each GROUP BY group?
- PostgreSQL: running count of rows for a query 'by minute'
试试这个:
select t1.url,t1.title,t2.count from headlines t1
inner join(
select url,count(*) as count,max(created_at) as created_at
from headlines group by url ) t2 on t1.url=t2.url and t1.created_at=t2.created_at;
SQL Fiddle: http://sqlfiddle.com/#!15/f7665f/11