仅基于 1 列删除重复项

Question

我的数据格式如下：

rep_id  user_id  other non-duplicated data
1       1        ...
1       2        ...
2       3        ...
3       4        ...
3       5        ...

我正在尝试使用 0/1 为 deduped_rep 建立一个列，这样关联用户中只有第一个代表 ID 为 1，其余为 0。

预期结果：

rep_id  user_id  deduped_rep
1       1        1
1       2        0
2       3        1
3       4        1
3       5        0

作为参考，在 Excel 中，我将使用以下公式： IF(SUMPRODUCT(($A:$A2=A2)*($A:$A2=A2))>1,0,1)

我知道有 FIXED() LoD 计算 http://kb.tableau.com/articles/howto/removing-duplicate-data-with-lod-calculations，但我只看到它基于另一列进行重复数据删除的用例。然而，我的是不同的。

Answer 1

试试这个查询

select 
    rep_id,
    user_id,
    row_number() over(partition by rep_id order by rep_id,user_id) deduped_rep
from 
    table

Answer 2

定义一个字段first_reg_date_per_rep_id为

{ fixed rep_id : min(registration_date) }

定义一个字段is_first_reg_date?为

registration_date = first_reg_date_per_rep_id

您可以使用最后一个布尔字段来区分每个 rep_id 的第一条记录和后面的

Remove duplicates based on only 1 column