运行 按一个字段分组并按另一个字段排序的 Postgres 查询

Run Postgres query that groups by one field and sorts by another

我有一个具有以下相关字段的 PostgreSQL table:

url
title
created_at

可以有很多行包含相同的 URL 但不同的标题。以下是一些示例行:

www.nytimes.com | The New York Times         | 2016-01-01 00:00:00`
www.wsj.com     | The Wall Street Journal    | 2016-01-03 15:32:13`
www.nytimes.com | The New York Times Online  | 2016-01-06 07:19:08`

我正在尝试获取列出以下字段的输出:

1) url
2) title 对应最高值 created_at
3) 该唯一 url

的所有 title 计数

因此,上述示例的输出行将如下所示:

www.nytimes.com | The New York Times Online | 2
www.wsj.com     | The Wall Street Journal   | 1

基于我读过的关于类似问题的大量 SO 帖子,看起来我获取前两个字段(url 和最新的 title)的最佳选择是使用 DISTINCT ON:

select distinct on (url) url, title from headlines order by url, created_at desc 

同样,要获得第一个和第三个字段(url 和所有 title 的计数),我可以简单地使用 GROUP BY:

select url, count(title) from headlines group by url

我想不通的是如何结合上述方法并获得我试图获得的 above-mentioned 三个值。

(经过编辑以提供更清晰的内容。)

试试看;

select t1.url, t2.title, t1.cnt
from (
  select url, count(title) cnt 
  from headlines 
  group by url
) t1
join (
  select distinct on (url) url, title 
  from headlines 
  order by url, created_at desc
) t2 on t1.url = t2.url
order by t1.url

join 两个查询 url

sql fiddle demo

这可以通过对 table 的单次扫描在单个 SELECT 中完成 - 通过将 window functionDISTINCT ON 组合:

SELECT DISTINCT ON (url)
       url, title, count(*) OVER (PARTITION BY url) AS ct 
FROM   headlines 
ORDER  BY url, created_at DESC NULLS LAST;

SQL Fiddle.

相关(有详细解释):

  • Best way to get result count before LIMIT was applied
  • Select first row in each GROUP BY group?
  • PostgreSQL: running count of rows for a query 'by minute'

试试这个:

select t1.url,t1.title,t2.count from headlines t1 
inner join(
select url,count(*) as count,max(created_at) as created_at
from headlines group by url ) t2 on t1.url=t2.url and t1.created_at=t2.created_at;

SQL Fiddle: http://sqlfiddle.com/#!15/f7665f/11