如何计算 SQL 中 GROUP BY 之后的非重复计数?

How to get count of distinct following GROUP BY in SQL?

抱歉,这肯定是重复的,但我不知道 google 的正确用词。

我有一个 table 的购买决定,如下所示:

org_id    item_id    spend
--------------------------
123        AAB         2
123        AAC         4
124        AAB        10
124        AAD         5

我想找到所有只被三个或更少组织购买的物品。然后我想按总支出订购它们。

我如何在 SQL 中执行此操作?注意我正在使用 BigQuery SQL

到目前为止我有:

SELECT * 
FROM 
  (SELECT ??(org_id) as org_count, -- How do I get the count of different org_ids? 
         item_id, 
         SUM(spend) AS total_spend
  FROM mytable 
  GROUP BY item_id) t
WHERE org_count < 4
ORDER BY total_spend DESC

您的 SQL 品牌可能略有不同,但这就是您在 SQL 服务器中的做法:

Select item_id, sum(spend) as total_spent, count(distinct org_id) as num_orgs
from myTable
group by item_id
having num_orgs <= 3
order by total_spend desc
SELECT 
  item_id, 
  EXACT_COUNT_DISTINCT(org_id) AS org_count, 
  SUM(spend) AS total_spent
FROM mytable
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC

请注意,在 BigQuery 中:

If you use the COUNT with DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact.

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.

https://cloud.google.com/bigquery/query-reference#aggfunctions

的语法部分查看更多关于 COUNT 和 DISTINCT 的内容