R中的空间最近邻分配
Spatial nearest neighbor assignment in R
我正在进行一项研究,试图根据地址将颗粒物暴露分配给特定的个人。我有两个具有经度和纬度坐标的数据集。一种用于个人,另一种用于 pm 曝光块。我想根据最近的块为每个主题分配一个 pm 曝光块。
library(sp)
library(raster)
library(tidyverse)
#subject level data
subjectID<-c("A1","A2","A3","A4")
subjects<-data.frame(tribble(
~lon,~lat,
-70.9821391, 42.3769511,
-61.8668537, 45.5267133,
-70.9344039, 41.6220337,
-70.7283830, 41.7123494
))
row.names(subjects)<-subjectID
#PM Block Locations
blockID<-c("B1","B2","B3","B4","B5")
blocks<-data.frame(tribble(
~lon,~lat,
-70.9824591, 42.3769451,
-61.8664537, 45.5267453,
-70.9344539, 41.6220457,
-70.7284530, 41.7123454,
-70.7284430, 41.7193454
))
row.names(blocks)<-blockID
#Creating distance matrix
dis_matrix<-pointDistance(blocks,subjects,lonlat = TRUE)
###The above code doesnt preserve the row names. Is there a way to to do
that?
###I'm unsure about the below code
colnames(dis_matrix)<-row.names(subjects)
row.names(dis_matrix)<-row.names(blocks)
dis_data<-data.frame(dis_matrix)
###Finding nearst neighbor and coercing to usable format
getname <-function(x) {
row.names(dis_data[which.min(x),])
}
nn<-data.frame(lapply(dis_data,getname)) %>%
gather(key=subject,value=neighbor)
此代码为我提供了有意义的输出,但我不确定有效性和效率。任何有关如何改进和修复此代码的建议都将受到赞赏。我还收到错误消息:
Warning message:
attributes are not identical across measure variables;
they will be dropped
我无法确定其来源。
感谢观看!
下面是一些示例数据,您可以如何使用 pointDistance
:
library(raster)
#subject level data
subjectID <- c("A1","A2","A3","A4")
subxy <- matrix(c(-65, 42, -60, 4.5, -70, 20, -75, 41 ), ncol=2, byrow=TRUE)
#PM Block Locations
blockID <- c("B1","B2","B3","B4","B5")
blockxy <- matrix(c(-68, 22, -61, 25, -70, 31, -65, 11,-63, 21), ncol=2, byrow=TRUE)
# distance of all subxy to all blockxy points
d <- pointDistance(subxy, blockxy, lonlat=TRUE)
# get the blockxy record nearest to each subxy record
r <- apply(d, 1, which.min)
r
#[1] 3 4 1 3
所以这对是:
p <- data.frame(subject=subjectID, block=blockID[r])
p
# subject block
#1 A1 B3
#2 A2 B4
#3 A3 B1
#4 A4 B3
说明它有效:
plot(rbind(blockxy, subxy), ylim=c(0,45), xlab='longitude', ylab='latitude')
points(blockxy, col="red", pch=20, cex=2)
points(subxy, col="blue", pch=20, cex=2)
text(subxy, subjectID, pos=1)
text(blockxy, blockID, pos=1)
for (i in 1:nrow(subxy)) {
arrows(subxy[i,1], subxy[i,2], blockxy[r[i],1], blockxy[r[i],2])
}
如果你有一个大数据集,你可能想要使用非常高效的 nabor
包,正如@user3507085 在 中所解释的那样。由于该问题已作为题外话结束,我已将答案复制粘贴到下面,因此它 "stays alive" 在此线程中。我不知道这是否被认为是不好的做法,如果需要,我很乐意 delete/edit(注意 knn
给出的距离 而不是 地理距离, 但我想它们可以通过包括 arcsin 在内的简单变换转换为球面距离):
lonlat2xyz=function (lon, lat, r)
{
lon = lon * pi/180
lat = lat * pi/180
if (missing(r))
r <- 6378.1
x <- r * cos(lat) * cos(lon)
y <- r * cos(lat) * sin(lon)
z <- r * sin(lat)
return(cbind(x, y, z))
}
lon1=runif(100,-180,180);lon2=runif(100,-180,180);lat1=runif(100,-90,90);lat2=runif(100,-90,90)
xyz1=lonlat2xyz(lon1,lat1)
xyz2=lonlat2xyz(lon2,lat2)
library(nabor)
out=knn(data=xyz1,query = xyz2,k=20)
library(maps)
map()
points(lon1,lat1,pch=16,col="black")
points(lon2[1],lat2[1],pch=16,col="red")
points(lon1[out$nn.idx[1,]],lat1[out$nn.idx[1,]],pch=16,col="blue")
我正在进行一项研究,试图根据地址将颗粒物暴露分配给特定的个人。我有两个具有经度和纬度坐标的数据集。一种用于个人,另一种用于 pm 曝光块。我想根据最近的块为每个主题分配一个 pm 曝光块。
library(sp)
library(raster)
library(tidyverse)
#subject level data
subjectID<-c("A1","A2","A3","A4")
subjects<-data.frame(tribble(
~lon,~lat,
-70.9821391, 42.3769511,
-61.8668537, 45.5267133,
-70.9344039, 41.6220337,
-70.7283830, 41.7123494
))
row.names(subjects)<-subjectID
#PM Block Locations
blockID<-c("B1","B2","B3","B4","B5")
blocks<-data.frame(tribble(
~lon,~lat,
-70.9824591, 42.3769451,
-61.8664537, 45.5267453,
-70.9344539, 41.6220457,
-70.7284530, 41.7123454,
-70.7284430, 41.7193454
))
row.names(blocks)<-blockID
#Creating distance matrix
dis_matrix<-pointDistance(blocks,subjects,lonlat = TRUE)
###The above code doesnt preserve the row names. Is there a way to to do
that?
###I'm unsure about the below code
colnames(dis_matrix)<-row.names(subjects)
row.names(dis_matrix)<-row.names(blocks)
dis_data<-data.frame(dis_matrix)
###Finding nearst neighbor and coercing to usable format
getname <-function(x) {
row.names(dis_data[which.min(x),])
}
nn<-data.frame(lapply(dis_data,getname)) %>%
gather(key=subject,value=neighbor)
此代码为我提供了有意义的输出,但我不确定有效性和效率。任何有关如何改进和修复此代码的建议都将受到赞赏。我还收到错误消息:
Warning message:
attributes are not identical across measure variables;
they will be dropped
我无法确定其来源。
感谢观看!
下面是一些示例数据,您可以如何使用 pointDistance
:
library(raster)
#subject level data
subjectID <- c("A1","A2","A3","A4")
subxy <- matrix(c(-65, 42, -60, 4.5, -70, 20, -75, 41 ), ncol=2, byrow=TRUE)
#PM Block Locations
blockID <- c("B1","B2","B3","B4","B5")
blockxy <- matrix(c(-68, 22, -61, 25, -70, 31, -65, 11,-63, 21), ncol=2, byrow=TRUE)
# distance of all subxy to all blockxy points
d <- pointDistance(subxy, blockxy, lonlat=TRUE)
# get the blockxy record nearest to each subxy record
r <- apply(d, 1, which.min)
r
#[1] 3 4 1 3
所以这对是:
p <- data.frame(subject=subjectID, block=blockID[r])
p
# subject block
#1 A1 B3
#2 A2 B4
#3 A3 B1
#4 A4 B3
说明它有效:
plot(rbind(blockxy, subxy), ylim=c(0,45), xlab='longitude', ylab='latitude')
points(blockxy, col="red", pch=20, cex=2)
points(subxy, col="blue", pch=20, cex=2)
text(subxy, subjectID, pos=1)
text(blockxy, blockID, pos=1)
for (i in 1:nrow(subxy)) {
arrows(subxy[i,1], subxy[i,2], blockxy[r[i],1], blockxy[r[i],2])
}
如果你有一个大数据集,你可能想要使用非常高效的 nabor
包,正如@user3507085 在 knn
给出的距离 而不是 地理距离, 但我想它们可以通过包括 arcsin 在内的简单变换转换为球面距离):
lonlat2xyz=function (lon, lat, r)
{
lon = lon * pi/180
lat = lat * pi/180
if (missing(r))
r <- 6378.1
x <- r * cos(lat) * cos(lon)
y <- r * cos(lat) * sin(lon)
z <- r * sin(lat)
return(cbind(x, y, z))
}
lon1=runif(100,-180,180);lon2=runif(100,-180,180);lat1=runif(100,-90,90);lat2=runif(100,-90,90)
xyz1=lonlat2xyz(lon1,lat1)
xyz2=lonlat2xyz(lon2,lat2)
library(nabor)
out=knn(data=xyz1,query = xyz2,k=20)
library(maps)
map()
points(lon1,lat1,pch=16,col="black")
points(lon2[1],lat2[1],pch=16,col="red")
points(lon1[out$nn.idx[1,]],lat1[out$nn.idx[1,]],pch=16,col="blue")