如何计算 SQL 中 GROUP BY 之后的非重复计数？

Question

抱歉，这肯定是重复的，但我不知道 google 的正确用词。

我有一个 table 的购买决定，如下所示：

org_id    item_id    spend
--------------------------
123        AAB         2
123        AAC         4
124        AAB        10
124        AAD         5

我想找到所有只被三个或更少组织购买的物品。然后我想按总支出订购它们。

我如何在 SQL 中执行此操作？注意我正在使用 BigQuery SQL。

到目前为止我有：

SELECT * 
FROM 
  (SELECT ??(org_id) as org_count, -- How do I get the count of different org_ids? 
         item_id, 
         SUM(spend) AS total_spend
  FROM mytable 
  GROUP BY item_id) t
WHERE org_count < 4
ORDER BY total_spend DESC

Answer 1

您的 SQL 品牌可能略有不同，但这就是您在 SQL 服务器中的做法：

Select item_id, sum(spend) as total_spent, count(distinct org_id) as num_orgs
from myTable
group by item_id
having num_orgs <= 3
order by total_spend desc

Answer 2

SELECT 
  item_id, 
  EXACT_COUNT_DISTINCT(org_id) AS org_count, 
  SUM(spend) AS total_spent
FROM mytable
GROUP BY item_id
HAVING org_count < 4
ORDER BY total_spend DESC

请注意，在 BigQuery 中：

If you use the COUNT with DISTINCT keyword, the function returns the number of distinct values for the specified field. Note that the returned value for DISTINCT is a statistical approximation and is not guaranteed to be exact.

To compute the exact number of distinct values, use EXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using GROUP EACH BY on the relevant field(s) and then applying COUNT(*). The GROUP EACH BY approach is more scalable but might incur a slight up-front performance penalty.

在 https://cloud.google.com/bigquery/query-reference#aggfunctions

的语法部分查看更多关于 COUNT 和 DISTINCT 的内容

如何计算 SQL 中 GROUP BY 之后的非重复计数？

How to get count of distinct following GROUP BY in SQL?

sql

google-bigquery