如何有效地检查 GPS 坐标是否接近其他数千个坐标？

Question

我有1100万个GPS坐标要分析，效率是我的主要问题。问题如下：我只想在它周围每 50 米半径内保留 1 个 GPS 坐标（称之为节点）。所以代码非常简单，我有一个集合 G，对于 G 中的每个节点，我检查我要添加的节点是否离其他任何节点太近。如果距离太近（<50 米），我不会添加它。否则我会添加它。

问题是集合 G 增长得非常快，最后要检查是否要向集合中添加一个节点，我需要运行对数百万个元素进行 for 循环...

这里是节点的简化代码class:

from geopy import distance

class Node: #a point on the map
    def __init__(self, lat, long): #lat and long in degree
        self.lat = lat
        self.long = long

    def distanceTo(self, otherNode):
        return distance.distance((self.lat, self.long), (otherNode.lat, otherNode.long)).km

    def equivalent(self, otherNode):
        return self.distanceTo(otherNode) < 0.05 #50 meters away

这是'add'过程：

currentNode = Node(lat, long)

alreadyIn = False
for n in graph:
    if n.equivalent(currentNode):
        alreadyIn = True
        break

#set of Nodes
if alreadyIn == False:
    G.add(currentNode)

这不是节点聚类的问题，因为我没有尝试检测数据集中的任何模式。我只是想将 50 米半径内的节点分组。

我认为最好的数据结构是给定坐标 return True 或 False（如果集合中有相似节点）。但是我无法弄清楚使用哪个，因为我没有将环境划分为正方形而是圆圈。（是的，节点 A 可以等效于 B 和 C，但 B 和 C 不等效，但我真的不介意...）。

感谢您的帮助！

Answer 1

对于这样的计算，使用面向对象的方法通常速度较慢（尽管更具可读性）。

您可以将您的纬度、经度转换为笛卡尔 x、y、z 并从您的节点创建 numpy 数组并使用 scipy 非常快 cKDTree. It provides several methods for operations like this, in your case query_ball_point 可能是正确的。

如何有效地检查 GPS 坐标是否接近其他数千个坐标？

How to check if a GPS coordinate is close to thousands of others efficiently?

python

gps