MySQL 函数:按最相似的属性对 table 进行排名

MySQL function: rank table by most similar attributes

我有 products idskeywords 的 table,如下所示:

+------------+------------------+------+-----+---------+----------------+
| Field      | Type             | Null | Key | Default | Extra          |
+------------+------------------+------+-----+---------+----------------+
| id         | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| product_id | int(10) unsigned | YES  | MUL | NULL    |                |
| keyword    | varchar(255)     | YES  |     | NULL    |                |
+------------+------------------+------+-----+---------+----------------+

此 table 仅存储产品 ID 以及与这些产品关联的关键字。例如,它可能包含:

+----+------------+---------+
| id | product_id | name    |
+----+------------+---------+
|  1 |         1  | soft    |
|  2 |         1  | red     |
|  3 |         1  | leather |
|  4 |         2  | cloth   |
|  5 |         2  | red     |
|  6 |         2  | new     |
|  7 |         3  | soft    |
|  8 |         3  | red     |
|  9 |         4  | blue    |
+----+------------+---------+

换句话说:

我需要一些方法来获取产品 ID,并取回按常用关键字数量排序的产品 ID 排序列表

例如,如果我传入 product_id 1,我希望返回:

+----+-------+------------+
| product_id | matches    |
+------------+------------+
|     3      | 2          | (product 3 has two common keywords with product 1)
|     2      | 1          | (product 2 has one common keyword with product 1)
|     4      | 0          | (product 4 has no common keywords with product 1)
+------------+------------+

一个选项使用带有条件聚合的自右外连接来计算匹配名称之间的数量,例如产品 ID 1 和所有其他产品 ID:

SELECT t2.product_id,
       SUM(CASE WHEN t1.name IS NOT NULL THEN 1 ELSE 0 END) AS matches
FROM yourTable t1
RIGHT JOIN yourTable t2
    ON t1.name = t2.name AND
       t1.product_id = 1
WHERE t2.product_id <> 1
GROUP BY t2.product_id
ORDER BY t2.product_id

按照下面的 link 进行 运行 演示:

SQLFiddle

您需要使用 outer join 来对抗 keywords 以获得 productid 1:

select y.productid, count(y2.keyword)
from yourtable y 
  left join (
    select keyword from yourtable y2 where y2.productid = 1
    ) y2 on y.keyword = y2.keyword
where y.productid <> 1
group by y.productid
order by 2 desc

结果:

| productid | count(y2.keyword) |
|-----------|-------------------|
|         3 |                 2 |
|         2 |                 1 |
|         4 |                 0 |