在嵌套查询中使用示例关键字

Question

我有以下形式的查询：

with t1 as (
  select id, col2
  from atable) 
select
  distinct id
from t1
sample 100
inner join t1 as t2 on t1.id = t2.id

Returns 错误 3706，"expected something between an integer and the inner keyword"

当我注释掉示例 100 行时，查询运行正常。

我的最终目标是从t1中获取样本。但是，由于一个 ID 可以在 t1 中出现多次，所以我不需要使用 sample 来分解它们。因此，我希望避免由于使用 sample 关键字而导致每个 id 的事件历史被拆分或缺少条目的采样数据集。换句话说，我想获取 ID 样本，然后用它来过滤我的 table t1.

这样，每个ID的t1事件历史就完整了。

我该怎么做？

Answer 1

SAMPLE 在 GROUP BY/HAVING/QUALIFY 之后执行，在 DISTINCT 操作和 ORDER BY 之前执行。您需要将样本移动到 CTE 中：

with t1 as (
  select id, col2
  from atable
  sample 100
) 
select
  distinct id
from t1
inner join t1 as t2 on t1.id = t2.id

根据您的评论，您希望将示例应用于不同的值：

with t1 as (
  select id
  from atable
  group by id -- Distinct is calculated after Sample
  sample 100
) 
select t.*
from atable as t
join t1 
  on t1.id = t2.id

如果 atable 很大，那么不同的操作可能会使用大量资源（它在 Sample 之前首先假脱机）并且嵌套的 Sample 应该会提高性能：

with t1 as (
  select id
  from 
   ( select id 
     from atable
                  -- reduce the number of rows for the following Group By
     sample 10000 -- sample must be large enough to have 100 distinct IDs
   ) as t
  group by id -- Distinct is calculated after Sample
  sample 100
) 
select t.*
from atable as t
join t1 
  on t1.id = t2.id

在嵌套查询中使用示例关键字

Use sample keyword within a nested query

sql

teradata