在 SQL 中的公共键的列内重叠

Overlap within a column for a common key in SQL

关于如何将每个用户可以具有多个值的数据集转换为显示两个值重叠次数(为每个用户找到)的数据集的任何提示?

原始假设数据集:

User_ID Toured_State
A       NY
A       CA
A       FL
B       NY
B       TX
C       NY
C       CA
D       TX

所需数据集:

State_1     State_2     Count of users that toured both states
NY          CA          2
NY          TX          1
NY          FL          1
NY          NY          0

这将显示游览一个州的用户也游览第二个州的频率。

我的第一个想法是在用户id 上对原始数据集进行自连接,然后对相等的行数求和(考虑到反向重复)?这是最有效的方法吗?请注意,用户可以自由游览一个或多个州(不限于两个),包括同一州两次。我已经更改了我的示例,所以我意识到这个特定案例似乎没有用。感谢您提前提供任何提示。

我会这样做:

select t1.state as state1, t2.state as state2, count(*)
from t t1 join
     t t2
     on t1.user_id = t2.user_id and t1.state < t2.state
group by t1.state, t2.state
order by count(*) desc;

如果您更喜欢统计用户,那么用户count(distinct user_id)

你可以试试这个。

;WITH CTE AS (
    SELECT *, RN= ROW_NUMBER() OVER(PARTITION BY User_ID Order BY User_ID) FROM @T 
)
SELECT 
    T1.Toured_State State_1, 
    T2.Toured_State State_2, 
    COUNT(CASE WHEN T1.Toured_State = T2.Toured_State THEN NULL ELSE 1 END) [Count of users]
FROM CTE T1
    LEFT JOIN CTE T2 ON T1.User_ID = T2.User_ID AND T1.RN <= T2.RN
WHERE T1.RN = 1
GROUP BY T1.Toured_State, T2.Toured_State
ORDER BY [Count of users] DESC