R - 从数据框中的字符串中提取坐标
R - Extract coordinates from String in dataframe
我在 R 数据框中有这样的数据 - 这些都放在一个名为 SHAPE 的列中(下面只是摘录):
- "POINT (16.361866982751053 48.177421074512125)"
- "POINT (16.30410258091979 48.16069903617549)"
- "POINT (16.226971074542572 48.20539106235006)"
- "POINT (16.36781410799229 48.25479849185693)"
我想提取坐标,以便将它们以数字格式放置在我的数据框的一列 "X" 和一列 "Y" 中。 挑战在于数字的长度并不总是相同。
结果应如下所示
第 X 列:
- 16.361866982751053
- 16.30410258091979
- 16.226971074542572
- 16.36781410799229
Y 列:
- 48.177421074512125
- 48.16069903617549
- 48.20539106235006
- 48.25479849185693
使用sub
:
point <- "POINT (16.361866982751053 48.177421074512125)"
x <- sub("POINT \((\d+\.\d+) \d+\.\d+\)", "\1", point, perl=TRUE)
y <- sub("POINT \(\d+\.\d+ (\d+\.\d+)\)", "\1", point, perl=TRUE)
只是提供另一种解决方案,这次使用 strsplit()
和 lapply()
:
df <- data.frame(SHAPE = c("POINT (16.361866982751053 48.177421074512125)",
"POINT (16.30410258091979 48.16069903617549)",
"POINT (16.226971074542572 48.20539106235006)",
"POINT (16.36781410799229 48.25479849185693)"),
stringsAsFactors = F)
df[c("x", "y")] <- do.call(rbind, lapply(strsplit(df$SHAPE, "[()]"), function(col) {
(parts <- unlist(strsplit(col[2], " ")))
}))
df
这会产生
SHAPE x y
1 POINT (16.361866982751053 48.177421074512125) 16.361866982751053 48.177421074512125
2 POINT (16.30410258091979 48.16069903617549) 16.30410258091979 48.16069903617549
3 POINT (16.226971074542572 48.20539106235006) 16.226971074542572 48.20539106235006
4 POINT (16.36781410799229 48.25479849185693) 16.36781410799229 48.25479849185693
>
我在 R 数据框中有这样的数据 - 这些都放在一个名为 SHAPE 的列中(下面只是摘录):
- "POINT (16.361866982751053 48.177421074512125)"
- "POINT (16.30410258091979 48.16069903617549)"
- "POINT (16.226971074542572 48.20539106235006)"
- "POINT (16.36781410799229 48.25479849185693)"
我想提取坐标,以便将它们以数字格式放置在我的数据框的一列 "X" 和一列 "Y" 中。 挑战在于数字的长度并不总是相同。
结果应如下所示
第 X 列:
- 16.361866982751053
- 16.30410258091979
- 16.226971074542572
- 16.36781410799229
Y 列:
- 48.177421074512125
- 48.16069903617549
- 48.20539106235006
- 48.25479849185693
使用sub
:
point <- "POINT (16.361866982751053 48.177421074512125)"
x <- sub("POINT \((\d+\.\d+) \d+\.\d+\)", "\1", point, perl=TRUE)
y <- sub("POINT \(\d+\.\d+ (\d+\.\d+)\)", "\1", point, perl=TRUE)
只是提供另一种解决方案,这次使用 strsplit()
和 lapply()
:
df <- data.frame(SHAPE = c("POINT (16.361866982751053 48.177421074512125)",
"POINT (16.30410258091979 48.16069903617549)",
"POINT (16.226971074542572 48.20539106235006)",
"POINT (16.36781410799229 48.25479849185693)"),
stringsAsFactors = F)
df[c("x", "y")] <- do.call(rbind, lapply(strsplit(df$SHAPE, "[()]"), function(col) {
(parts <- unlist(strsplit(col[2], " ")))
}))
df
这会产生
SHAPE x y
1 POINT (16.361866982751053 48.177421074512125) 16.361866982751053 48.177421074512125
2 POINT (16.30410258091979 48.16069903617549) 16.30410258091979 48.16069903617549
3 POINT (16.226971074542572 48.20539106235006) 16.226971074542572 48.20539106235006
4 POINT (16.36781410799229 48.25479849185693) 16.36781410799229 48.25479849185693
>