将 lat lon 字符串转换为 geojson 多边形

Convert string of lat lon to geojson polygon

我知道了。数据框:

col_a   col_b   col_c   lat lon polyline                                                            
0   2.2 3/27/2017 17:45 -34.92967678    -62.34831333    [{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]"      
0   3.3 3/27/2017 17:45 -34.92967678    -62.34831333    [{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]"      

我想将其转换为 geopandas 数据框(带有折线的几何信息),但折线列不是标准格式。如何解决这个问题?

我为你做的。

import json
lat_lon_str = '''[{"lat": -32.436756736154024, "lng": -62.17932943721189},
               {"lat": -32.445847463649905, "lng": -62.18160395045652},
               {"lat": -32.44686151186612, "lng": -62.176711601213356},
               {"lat": -32.44721472434227, "lng": -62.17625005841933},
               {"lat": -32.44387381345414, "lng": -62.17003797011375},
               {"lat": -32.44158302782885, "lng": -62.16614345534663},
               {"lat": -32.43979915340108, "lng": -62.16164831538572}]'''

lat_lon_json = json.loads(lat_lon_str)
coords = ["POINT({} {})".format(round(line['lat'], 2), round(line['lng'], 2)) for line in lat_lon_json]
print(coords)

结果:

测试后,如果结果是你想要的,请告诉我。

IIUC,如果原始数据帧是Pandas数据帧,那么你可以尝试使用Series.str.translate to remove all double quotes and use Series.str.findall将所有lat-long对检索到元组列表中,然后分配坐标以创建多边形(注意我们使用 map(float,x) 将 lat/long 从 str 转换为 float):

import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon

df['coords'] = df.polyline \
    .str.translate(str.maketrans({'"':''})) \
    .str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')

geometry = [ Polygon([(float(x), float(y)) for x,y in e]) for e in df['coords'] ]

gdf = gpd.GeoDataFrame(df.drop(['coords','polyline'], axis=1), geometry=geometry)

编辑: 如果 pandas.Series.str 下的方法不可用,你可以使用 Python re 模块做同样的事情,例如:(假设原始数据框是一个名为 gdf)

的地理数据框
import re
ptn = re.compile(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon(tuple(map(float,x)) for x in re.findall(ptn, x.replace('"',''))) for e in gdf["polyline"] ]
gdf_new = gpd.GeoDataFrame(gdf, geometry=geometry)

如果数据已经在 GeoDataFrame 中,@jxc 建议的代码也适用,因为 GeoPandas 支持字符串操作。

这里是 re-create GeoDataFrame

的一段代码
from io import StringIO #Python 3 
import pandas as pd 
import geopandas as gpd 

df_string="""0;2.2;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]"       0;3.3;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683   lng:-62.34831333160395} {"lat":-34.93002861969753   lng:-62.360866069793644}    {"lat":-34.93526211379422   lng:-62.36063016609785} {"lat":-34.93571078689853   lng:-62.35996507775451} {"lat":-34.935798629937075  lng:-62.34816312789911} {"lat":-34.9333358703344    lng:-62.34824895858759} {"lat":-34.9320340961022    lng:-62.348334789276066}]" """

df_io = StringIO(df_string)
df = pd.read_csv(df_io, sep=";", names=["col_a","col_b","col_c","lat","lon","polyline"])
gdf = gpd.GeoDataFrame(df)

结果是

gdf
    col_a   col_b   col_c   lat lon polyline
0   0   2.2 3/27/2017 17:45 -34.92967678    -62.34831333    "[{lat"":-34.92967677667683   lng:-62.34831333160395} {""lat"":-34.93002861969753   lng:-62.360866069793644}    {""lat"":-34.93526211379422   lng:-62.36063016609785} {""lat"":-34.93571078689853   lng:-62.35996507775451} {""lat"":-34.935798629937075  lng:-62.34816312789911} {""lat"":-34.9333358703344    lng:-62.34824895858759} {""lat"":-34.9320340961022    lng:-62.348334789276066}]""      "
1   0   3.3 3/27/2017 17:45 -34.92967678    -62.34831333    "[{lat"":-34.92967677667683   lng:-62.34831333160395} {""lat"":-34.93002861969753   lng:-62.360866069793644}    {""lat"":-34.93526211379422   lng:-62.36063016609785} {""lat"":-34.93571078689853   lng:-62.35996507775451} {""lat"":-34.935798629937075  lng:-62.34816312789911} {""lat"":-34.9333358703344    lng:-62.34824895858759} {""lat"":-34.9320340961022    lng:-62.348334789276066}]"""

然后,如果几何是 polyline 列名称所建议的直线,则应使用 Shapely LineString 方法而不是 Polygon

from shapely.geometry import LineString
coords = gdf.polyline \
    .str.translate(str.maketrans({'"':''})) \
    .str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')

gdf.geometry = [ LineString([(float(x), float(y)) for x,y in e]) for e in coords ]

由于两个几何图形相同,我们可以绘制第一个:

gdf[0:1].plot()