在 data.table 中应用按行返回 list/matrix 的函数
Applying a function returning list/matrix row-wise in data.table
我正在尝试执行 http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/ 中提到的步骤,但使用的是 data.table
。特别是那里列出的第 8 步。附件是我的步骤和我 运行 遇到的问题:
library(data.table)
library(maps)
library(geosphere)
airports <- as.data.table(read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header=TRUE))
flights <- as.data.table(read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv", header=TRUE, as.is=TRUE))
setnames(airports,c("airport1",names(airports)[2:7]))
setkey(flights,airport1)
setkey(airports,airport1)
ap <- merge(flights,airports)
setkey(ap,airport2)
setnames(airports,c("airport2",names(airports)[2:7]))
setkey(airports,airport2)
setkey(ap,airport2)
ap2 <- merge(ap,airports)
ap3 <- ap2[,.(airport1,airport2,airline,cnt,lat.x,long.x,lat.y,long.y)]
## ap3[,inter:=gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),] ## Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
## ap3[,inter:=gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),] ## Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
##
## Tried some more stuff but no luck!
## fn <- function(lonx,latx,lony,laty) gcIntermediate(c(lonx,latx),c(lony,laty),n=100,addStartEnd=TRUE)
## ap3[,do.call(fn,.SD),.SDcols=5:8] ## Error in (function (lonx, latx, lony, laty) : unused arguments (lat.x = c(35.21401111, 35.2140 ... snip ...
所以我搜索了 Whosebug 并尝试了 [1] 和 [2] 中列出的步骤,但无法让它工作。我记得在某处读过(虽然现在找不到)data.table 可以存储列表,但我不知道如何存储。此外,除了 FAQ 的第 2.9 节中列出的内容之外,还有其他方法可以调试 j
中的函数吗?
[1] efficient row-wise operations on a data.table
[2] Applying a function to each row of a data.table
假设您有一个 returns 未知大小矩阵的函数。您可以将结果分配到具有列表列的 data.table
中:
# example data
set.seed(42)
DT <- data.table(id=1:3)[,.(v=sample(letters,sample(5,1))),by=id]
# example function
myfun = function(x) matrix(x, ncol= if(length(x)%%2) 1 else 2 )
# usage
res <- DT[,.(vlist = list(myfun(v))),by=id]
# id vlist
# 1: 1 y,h,t,o,l
# 2: 2 d,q,y,k
# 3: 3 y,g,l,v
这可能看起来不像一列矩阵,但您可以看到它是:
str(res$vlist)
# List of 3
# $ : chr [1:5, 1] "y" "h" "t" "o" ...
# $ : chr [1:2, 1:2] "d" "q" "y" "k"
# $ : chr [1:2, 1:2] "y" "g" "l" "v"
res$vlist[[2]]
# [,1] [,2]
# [1,] "d" "y"
# [2,] "q" "k"
(我不确定这是否是您想要的,因为我没有浏览链接的博客 post。)
这应该是一个真正的评论,但它不适合那里:
对于分别由 c(long.x,lat.x) 和 c(long.y,lat.y) 定义的每个 p1 和 p2,您有一个矩阵(或列表)(此后,我只关注矩阵)并且该矩阵的维度取决于 n 和 addStartEnd 的值。例如,如果您设置 n=1 和 addStartEnd=FALSE,它将 return 一个 1 x 2 维度的矩阵,如果您设置 n=1 和 addStartEnd=TRUE,它将 return 一个3 x 2 维矩阵。现在,使用像您这样的 data.table 操作,您不能简单地附加值。我不是data.table专家,但我认为正确的方法是,你必须进行行操作,然后使用rbindlist
。例如,
apt<-setDT(ap3)
tt<-rbindlist(lapply(1:nrow(apt),function(i)cbind(apt[i,],gcIntermediate(apt[i,c("long.x","lat.x")],apt[i,c("long.y","lat.y")],n=100,addStartEnd=TRUE))))
> tt
airport1 airport2 airline cnt lat.x long.x lat.y long.y lon lat
1: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.94313 35.21401
2: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.89245 35.26904
3: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.84171 35.32405
4: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.79090 35.37904
5: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.74002 35.43401
---
510710: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.50396 32.68840
510711: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.52947 32.68045
510712: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.55498 32.67250
510713: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.58048 32.66454
510714: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.60597 32.65658
根据@Frank 的建议:您可以仅使用 data.table 操作(其中 102 =100 (n)+ 2 (addStartEnd=TRUE))
ap3[,gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),by=1:nrow(ap3)][,list(lon=head(V1,102),lat=tail(V1,102)),by=nrow]
nrow lon lat
1: 1 -80.94313 35.21401
2: 1 -80.89245 35.26904
3: 1 -80.84171 35.32405
4: 1 -80.79090 35.37904
5: 1 -80.74002 35.43401
---
510710: 5007 -114.50396 32.68840
510711: 5007 -114.52947 32.68045
510712: 5007 -114.55498 32.67250
510713: 5007 -114.58048 32.66454
510714: 5007 -114.60597 32.65658
我正在尝试执行 http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/ 中提到的步骤,但使用的是 data.table
。特别是那里列出的第 8 步。附件是我的步骤和我 运行 遇到的问题:
library(data.table)
library(maps)
library(geosphere)
airports <- as.data.table(read.csv("http://datasets.flowingdata.com/tuts/maparcs/airports.csv", header=TRUE))
flights <- as.data.table(read.csv("http://datasets.flowingdata.com/tuts/maparcs/flights.csv", header=TRUE, as.is=TRUE))
setnames(airports,c("airport1",names(airports)[2:7]))
setkey(flights,airport1)
setkey(airports,airport1)
ap <- merge(flights,airports)
setkey(ap,airport2)
setnames(airports,c("airport2",names(airports)[2:7]))
setkey(airports,airport2)
setkey(ap,airport2)
ap2 <- merge(ap,airports)
ap3 <- ap2[,.(airport1,airport2,airline,cnt,lat.x,long.x,lat.y,long.y)]
## ap3[,inter:=gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),] ## Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
## ap3[,inter:=gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),] ## Error in .pointsToMatrix(p1) : Wrong length for a vector, should be 2
##
## Tried some more stuff but no luck!
## fn <- function(lonx,latx,lony,laty) gcIntermediate(c(lonx,latx),c(lony,laty),n=100,addStartEnd=TRUE)
## ap3[,do.call(fn,.SD),.SDcols=5:8] ## Error in (function (lonx, latx, lony, laty) : unused arguments (lat.x = c(35.21401111, 35.2140 ... snip ...
所以我搜索了 Whosebug 并尝试了 [1] 和 [2] 中列出的步骤,但无法让它工作。我记得在某处读过(虽然现在找不到)data.table 可以存储列表,但我不知道如何存储。此外,除了 FAQ 的第 2.9 节中列出的内容之外,还有其他方法可以调试 j
中的函数吗?
[1] efficient row-wise operations on a data.table
[2] Applying a function to each row of a data.table
假设您有一个 returns 未知大小矩阵的函数。您可以将结果分配到具有列表列的 data.table
中:
# example data
set.seed(42)
DT <- data.table(id=1:3)[,.(v=sample(letters,sample(5,1))),by=id]
# example function
myfun = function(x) matrix(x, ncol= if(length(x)%%2) 1 else 2 )
# usage
res <- DT[,.(vlist = list(myfun(v))),by=id]
# id vlist
# 1: 1 y,h,t,o,l
# 2: 2 d,q,y,k
# 3: 3 y,g,l,v
这可能看起来不像一列矩阵,但您可以看到它是:
str(res$vlist)
# List of 3
# $ : chr [1:5, 1] "y" "h" "t" "o" ...
# $ : chr [1:2, 1:2] "d" "q" "y" "k"
# $ : chr [1:2, 1:2] "y" "g" "l" "v"
res$vlist[[2]]
# [,1] [,2]
# [1,] "d" "y"
# [2,] "q" "k"
(我不确定这是否是您想要的,因为我没有浏览链接的博客 post。)
这应该是一个真正的评论,但它不适合那里:
对于分别由 c(long.x,lat.x) 和 c(long.y,lat.y) 定义的每个 p1 和 p2,您有一个矩阵(或列表)(此后,我只关注矩阵)并且该矩阵的维度取决于 n 和 addStartEnd 的值。例如,如果您设置 n=1 和 addStartEnd=FALSE,它将 return 一个 1 x 2 维度的矩阵,如果您设置 n=1 和 addStartEnd=TRUE,它将 return 一个3 x 2 维矩阵。现在,使用像您这样的 data.table 操作,您不能简单地附加值。我不是data.table专家,但我认为正确的方法是,你必须进行行操作,然后使用rbindlist
。例如,
apt<-setDT(ap3)
tt<-rbindlist(lapply(1:nrow(apt),function(i)cbind(apt[i,],gcIntermediate(apt[i,c("long.x","lat.x")],apt[i,c("long.y","lat.y")],n=100,addStartEnd=TRUE))))
> tt
airport1 airport2 airline cnt lat.x long.x lat.y long.y lon lat
1: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.94313 35.21401
2: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.89245 35.26904
3: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.84171 35.32405
4: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.79090 35.37904
5: CLT ABE all 56 35.21401 -80.94313 40.65236 -75.4404 -80.74002 35.43401
---
510710: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.50396 32.68840
510711: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.52947 32.68045
510712: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.55498 32.67250
510713: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.58048 32.66454
510714: PHX YUM YV 328 33.43417 -112.00806 32.65658 -114.6060 -114.60597 32.65658
根据@Frank 的建议:您可以仅使用 data.table 操作(其中 102 =100 (n)+ 2 (addStartEnd=TRUE))
ap3[,gcIntermediate(c(long.x,lat.x),c(long.y,lat.y),n=100,addStartEnd=TRUE),by=1:nrow(ap3)][,list(lon=head(V1,102),lat=tail(V1,102)),by=nrow]
nrow lon lat
1: 1 -80.94313 35.21401
2: 1 -80.89245 35.26904
3: 1 -80.84171 35.32405
4: 1 -80.79090 35.37904
5: 1 -80.74002 35.43401
---
510710: 5007 -114.50396 32.68840
510711: 5007 -114.52947 32.68045
510712: 5007 -114.55498 32.67250
510713: 5007 -114.58048 32.66454
510714: 5007 -114.60597 32.65658