数据框中的条件更新坐标列

Conditional updating coordinate column in dataframe

我正在尝试使用同一数据框中其他列的数据填充数据框中的两个新空列,具体取决于它们是否被填充。

我正在尝试填充 HIGH_PRCN_LAT 和 HIGH_PRCN_LON(以前称为 F_Lat 和 F_Lon)的值,它们表示这些行的最终纬度和经度将基于 table.

中其他列的值

案例 1:Lat/Lon2 已填充(如 ID 1 和 2),使用了很好的 circle algorithm 应该计算它们之间的中点,并且 然后放入 F_Lat & F_Lon.

情况2:Lat/Lon2为空,则应将Lat/Lon1的值放入 进入 F_Lat 和 F_Lon(就像 ID 3 和 4)。

我的代码如下但不起作用(参见以前的版本,在编辑中删除)。

我使用的预备代码如下:

incidents <- structure(list(id = 1:9, StartDate = structure(c(1L, 3L, 2L, 
2L, 2L, 3L, 1L, 3L, 1L), .Label = c("02/02/2000 00:34", "02/09/2000 22:13", 
"20/01/2000 14:11"), class = "factor"), EndDate = structure(1:9, .Label = c("02/04/2006 20:46", 
"02/04/2006 22:38", "02/04/2006 23:21", "02/04/2006 23:59", "03/04/2006 20:12", 
"03/04/2006 23:56", "04/04/2006 00:31", "07/04/2006 06:19", "07/04/2006 07:45"
), class = "factor"), Yr.Period = structure(c(1L, 1L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L), .Label = c("2000 / 1", "2000 / 2", "2000 /3"
), class = "factor"), Description = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "ENGLISH TEXT", class = "factor"), 
    Location = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L
    ), .Label = c("Location 1", "Location 1 : Location 2"), class = "factor"), 
    Location.1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = "Location 1", class = "factor"), Postcode.1 = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Postcode 1", class = "factor"), 
    Location.2 = structure(c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 
    1L), .Label = c("", "Location 2"), class = "factor"), Postcode.2 = structure(c(2L, 
    2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L), .Label = c("", "Postcode 2"
    ), class = "factor"), Section = structure(c(2L, 2L, 3L, 1L, 
    4L, 4L, 2L, 1L, 4L), .Label = c("East", "North", "South", 
    "West"), class = "factor"), Weather.Category = structure(c(1L, 
    2L, 4L, 2L, 2L, 2L, 4L, 1L, 3L), .Label = c("Animals", "Food", 
    "Humans", "Weather"), class = "factor"), Minutes = c(13L, 
    55L, 5L, 5L, 5L, 522L, 1L, 11L, 22L), Cost = c(150L, 150L, 
    150L, 20L, 23L, 32L, 21L, 11L, 23L), Location.1.Lat = c(53.0506727, 
    53.8721035, 51.0233529, 53.8721035, 53.6988355, 53.4768766, 
    52.6874562, 51.6638245, 51.4301359), Location.1.Lon = c(-2.9991256, 
    -2.4004125, -3.0988341, -2.4004125, -1.3031529, -2.2298073, 
    -1.8023421, -0.3964916, 0.0213837), Location.2.Lat = c(52.7116187, 
    53.746791, NA, 53.746791, 53.6787167, 53.4527824, 52.5264907, 
    NA, NA), Location.2.Lon = c(-2.7493169, -2.4777984, NA, -2.4777984, 
    -1.489026, -2.1247029, -1.4645023, NA, NA)), class = "data.frame", row.names = c(NA, -9L))

#gpsColumns is used as the following line of code is used for several data frames.
gpsColumns <- c("HIGH_PRCN_LAT", "HIGH_PRCN_LON")
incidents [ , gpsColumns] <- NA

#create separate variable(?) containing a list of which rows are complete
ind <- complete.cases(incidents [,17])

#populate rows with a two Lat/Lons with great circle middle of both values
incidents [ind, c("HIGH_PRCN_LON_2","HIGH_PRCN_LAT_2")] <- 
  with(incidents [ind,,drop=FALSE],
       do.call(rbind, geosphere::midPoint(cbind.data.frame(Location.1.Lon, Location.1.Lat), cbind.data.frame(Location.2.Lon, Location.2.Lat))))

#populate rows with one Lat/Lon with those values
incidents[!ind, c("HIGH_PRCN_LAT","HIGH_PRCN_LON")] <- incidents[!ind, c("Location.1.Lat","Location.1.Lon")]

我将根据此处的建议使用 geosphere::midPoint 函数:http://r.789695.n4.nabble.com/Midpoint-between-coordinates-td2299999.html

遗憾的是,这种填充列的方式似乎在存在多种情况时不起作用。

当前抛出的错误是:

Error in `$<-.data.frame`(`*tmp*`, F_Lat, value = integer(0)) : 
  replacement has 0 rows, data has 178012

编辑:也发布到 reddit:https://www.reddit.com/r/Rlanguage/comments/bdvavx/conditional_updating_column_in_dataframe/

编辑:对我不理解的代码部分进行了澄清。

#replaces the F_Lat2/F_Lon2 columns in rows with a both sets of input coordinates 
dataframe[ind, c("F_Lat2","F_Lon2")] <-
#I am unclear on what this means, specifically what the "with" function does and what "drop=FALSE" does and also why they were used in this case.
  with(dataframe[ind,,drop=FALSE],
#I am unclear on what do.call and rbind are doing here, but the second half (geosphere onwards) is binding the Lats and Lons to make coordinates as inputs for the gcIntermediate function.
       do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
                                                cbind.data.frame(Lat2, Lon2), n = 1)))

虽然您的代码对我来说并不像编写的那样工作,而且我无法计算出您期望的相同精确值,但我怀疑您看到的错误可以通过这些步骤得到修复。 (数据在此处底部。)

  1. 预填充空列。
  2. 预先计算complete.cases步,这样可以节省时间。
  3. 对内部 gcIntermediate 使用 cbind.data.frame

我从

推断
gcIntermediate([dataframe...
               ^
               this is an error in R

您将这些列绑定在一起,所以我将使用 cbind.data.frame。 (使用 cbind 本身会产生一些来自 geosphere 的可忽略警告,因此您可以改用它,也许 suppressWarnings,但该功能有点强大,因为它也会掩盖其他警告.)

此外,由于您似乎希望每对坐标有 一个 中间值,因此我添加了 gcIntermediate(..., n=1) 参数。

使用do.call(rbind, ...)是因为gcIntermediatereturns一个list,所以需要把它们放在一起

dataframe$F_Lon2 <- dataframe$F_Lat2 <- NA_real_
ind <- complete.cases(dataframe[,4])

dataframe[ind, c("F_Lat2","F_Lon2")] <- 
  with(dataframe[ind,,drop=FALSE],
       do.call(rbind, geosphere::gcIntermediate(cbind.data.frame(Lat1, Lon1),
                                                cbind.data.frame(Lat2, Lon2), n = 1)))
dataframe[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]
dataframe
#   ID     Lat1      Lon1     Lat2      Lon2    F_Lat     F_Lon   F_Lat2    F_Lon2
# 1  1 19.05067 -3.999126 92.71332 -6.759169 55.88200 -5.379147 55.78466 -6.709509
# 2  2 58.87210 -1.400413 54.74679 -4.479840 56.80945 -2.940126 56.81230 -2.942029
# 3  3 33.02335 -5.098834       NA        NA 33.02335 -5.098834 33.02335 -5.098834
# 4  4 54.87210 -4.400412       NA        NA 54.87210 -4.400412 54.87210 -4.400412

更新,使用新的 incidents 数据并切换到 geosphere::midPoint

试试这个:

incidents$F_Lon2 <- incidents$F_Lat2 <- NA_real_
ind <- complete.cases(incidents[,4])

incidents[ind, c("F_Lat2","F_Lon2")] <- 
  with(incidents[ind,,drop=FALSE],
       geosphere::midPoint(cbind.data.frame(Location.1.Lat,Location.1.Lon),
                           cbind.data.frame(Location.2.Lat,Location.2.Lon)))
incidents[!ind, c("F_Lat2","F_Lon2")] <- dataframe[!ind, c("Lat1","Lon1")]

一个(大)区别是 geosphere::gcIntermediate(..., n=1) returns 一个结果列表,而 geosphere::midPoint(...) (没有 n=) returns 只是一个矩阵,所以不需要 rbinding。


数据:

dataframe <- read.table(header=T, stringsAsFactors=F, text="
ID Lat1       Lon1       Lat2      Lon2      F_Lat       F_Lon
1  19.0506727 -3.9991256 92.713318 -6.759169 55.88199535 -5.3791473
2  58.8721035 -1.4004125 54.746791 -4.47984  56.80944725 -2.94012625
3  33.0233529 -5.0988341 NA        NA        33.0233529  -5.0988341
4  54.8721035 -4.4004125 NA        NA        54.8721035  -4.4004125")