对 table "d" 的 FROM 子句条目的引用无效

invalid reference to FROM-clause entry for table "d"

作为 k-means 算法的一部分,我正在尝试更新每个项目所属的集群,如下面的查询所示。问题是,我似乎无法在嵌套查询中引用 table d。

UPDATE algorithms.km_crimes d SET cluster_id = c.id 
FROM (SELECT id FROM algorithms.km_cluster_centres c 
ORDER BY |/ (POW(d.latitude-c.latitude,2)+POW(d.longitude-c.longitude,2))      
ASC LIMIT 1) AS c
WHERE d.cluster_id IS DISTINCT FROM c.id;

谁能建议如何重组查询?我尝试了太多的修改来计算

你试过不用别名吗?

UPDATE algorithms.km_crimes SET cluster_id = c.id 
FROM (SELECT id FROM algorithms.km_cluster_centres c 
    ORDER BY |/ (POW(algorithms.km_crimes.latitude-c.latitude,2)+POW(algorithms.km_crimes.longitude-c.longitude,2)) ASC LIMIT 1) AS c
    WHERE algorithms.km_crimes.cluster_id IS DISTINCT FROM c.id;

根据您要转换的MySQL example,您根本不需要更改第一个查询。

算法不关心在每次迭代中重新分配多少次 cluster_id;它只需要在 none 个聚类中心移动时停止。幸运的是,第二个查询更容易修复。

这似乎有效:

CREATE TABLE km_data (id serial, cluster_id int, lat double precision, lng double precision);
CREATE TABLE km_clusters (id serial, lat double precision, lng double precision);

CREATE OR REPLACE FUNCTION kmeans(k int) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
  TRUNCATE km_clusters;

  INSERT INTO km_clusters (lat, lng)
  SELECT lat, lng FROM km_data
  ORDER BY random() LIMIT k;

  LOOP
    UPDATE km_data d SET cluster_id = (
      SELECT id FROM km_clusters c 
      ORDER BY |/(POW(d.lat-c.lat,2)+POW(d.lng-c.lng,2)) LIMIT 1
    );

    UPDATE km_clusters c
    SET lat=d.lat, lng=d.lng
    FROM (
      SELECT
        cluster_id, 
        AVG(lat) AS lat,
        AVG(lng) AS lng
      FROM km_data
      GROUP BY cluster_id
    ) d 
    WHERE
      c.id=d.cluster_id AND
      ABS(c.lat-d.lat) < 0.001 AND
      ABS(c.lng-d.lng) < 0.001;

    EXIT WHEN NOT FOUND;
  END LOOP;
END $$;

如果你想要更精确,你可以调整最后的 WHERE 子句中的数字,尽管这看起来是一个非常不精确的算法。