将 lat lon 字符串转换为 geojson 多边形
Convert string of lat lon to geojson polygon
我知道了。数据框:
col_a col_b col_c lat lon polyline
0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
我想将其转换为 geopandas 数据框(带有折线的几何信息),但折线列不是标准格式。如何解决这个问题?
我为你做的。
import json
lat_lon_str = '''[{"lat": -32.436756736154024, "lng": -62.17932943721189},
{"lat": -32.445847463649905, "lng": -62.18160395045652},
{"lat": -32.44686151186612, "lng": -62.176711601213356},
{"lat": -32.44721472434227, "lng": -62.17625005841933},
{"lat": -32.44387381345414, "lng": -62.17003797011375},
{"lat": -32.44158302782885, "lng": -62.16614345534663},
{"lat": -32.43979915340108, "lng": -62.16164831538572}]'''
lat_lon_json = json.loads(lat_lon_str)
coords = ["POINT({} {})".format(round(line['lat'], 2), round(line['lng'], 2)) for line in lat_lon_json]
print(coords)
结果:
测试后,如果结果是你想要的,请告诉我。
IIUC,如果原始数据帧是Pandas数据帧,那么你可以尝试使用Series.str.translate to remove all double quotes and use Series.str.findall将所有lat-long对检索到元组列表中,然后分配坐标以创建多边形(注意我们使用 map(float,x)
将 lat/long 从 str 转换为 float):
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
df['coords'] = df.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon([(float(x), float(y)) for x,y in e]) for e in df['coords'] ]
gdf = gpd.GeoDataFrame(df.drop(['coords','polyline'], axis=1), geometry=geometry)
编辑: 如果 pandas.Series.str
下的方法不可用,你可以使用 Python re 模块做同样的事情,例如:(假设原始数据框是一个名为 gdf)
的地理数据框
import re
ptn = re.compile(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon(tuple(map(float,x)) for x in re.findall(ptn, x.replace('"',''))) for e in gdf["polyline"] ]
gdf_new = gpd.GeoDataFrame(gdf, geometry=geometry)
如果数据已经在 GeoDataFrame 中,@jxc
建议的代码也适用,因为 GeoPandas 支持字符串操作。
这里是 re-create GeoDataFrame
的一段代码
from io import StringIO #Python 3
import pandas as pd
import geopandas as gpd
df_string="""0;2.2;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]" 0;3.3;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]" """
df_io = StringIO(df_string)
df = pd.read_csv(df_io, sep=";", names=["col_a","col_b","col_c","lat","lon","polyline"])
gdf = gpd.GeoDataFrame(df)
结果是
gdf
col_a col_b col_c lat lon polyline
0 0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 "[{lat"":-34.92967677667683 lng:-62.34831333160395} {""lat"":-34.93002861969753 lng:-62.360866069793644} {""lat"":-34.93526211379422 lng:-62.36063016609785} {""lat"":-34.93571078689853 lng:-62.35996507775451} {""lat"":-34.935798629937075 lng:-62.34816312789911} {""lat"":-34.9333358703344 lng:-62.34824895858759} {""lat"":-34.9320340961022 lng:-62.348334789276066}]"" "
1 0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 "[{lat"":-34.92967677667683 lng:-62.34831333160395} {""lat"":-34.93002861969753 lng:-62.360866069793644} {""lat"":-34.93526211379422 lng:-62.36063016609785} {""lat"":-34.93571078689853 lng:-62.35996507775451} {""lat"":-34.935798629937075 lng:-62.34816312789911} {""lat"":-34.9333358703344 lng:-62.34824895858759} {""lat"":-34.9320340961022 lng:-62.348334789276066}]"""
然后,如果几何是 polyline 列名称所建议的直线,则应使用 Shapely LineString
方法而不是 Polygon
:
from shapely.geometry import LineString
coords = gdf.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
gdf.geometry = [ LineString([(float(x), float(y)) for x,y in e]) for e in coords ]
由于两个几何图形相同,我们可以绘制第一个:
gdf[0:1].plot()
我知道了。数据框:
col_a col_b col_c lat lon polyline
0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 [{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]"
我想将其转换为 geopandas 数据框(带有折线的几何信息),但折线列不是标准格式。如何解决这个问题?
我为你做的。
import json
lat_lon_str = '''[{"lat": -32.436756736154024, "lng": -62.17932943721189},
{"lat": -32.445847463649905, "lng": -62.18160395045652},
{"lat": -32.44686151186612, "lng": -62.176711601213356},
{"lat": -32.44721472434227, "lng": -62.17625005841933},
{"lat": -32.44387381345414, "lng": -62.17003797011375},
{"lat": -32.44158302782885, "lng": -62.16614345534663},
{"lat": -32.43979915340108, "lng": -62.16164831538572}]'''
lat_lon_json = json.loads(lat_lon_str)
coords = ["POINT({} {})".format(round(line['lat'], 2), round(line['lng'], 2)) for line in lat_lon_json]
print(coords)
结果:
IIUC,如果原始数据帧是Pandas数据帧,那么你可以尝试使用Series.str.translate to remove all double quotes and use Series.str.findall将所有lat-long对检索到元组列表中,然后分配坐标以创建多边形(注意我们使用 map(float,x)
将 lat/long 从 str 转换为 float):
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon
df['coords'] = df.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon([(float(x), float(y)) for x,y in e]) for e in df['coords'] ]
gdf = gpd.GeoDataFrame(df.drop(['coords','polyline'], axis=1), geometry=geometry)
编辑: 如果 pandas.Series.str
下的方法不可用,你可以使用 Python re 模块做同样的事情,例如:(假设原始数据框是一个名为 gdf)
import re
ptn = re.compile(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
geometry = [ Polygon(tuple(map(float,x)) for x in re.findall(ptn, x.replace('"',''))) for e in gdf["polyline"] ]
gdf_new = gpd.GeoDataFrame(gdf, geometry=geometry)
如果数据已经在 GeoDataFrame 中,@jxc
建议的代码也适用,因为 GeoPandas 支持字符串操作。
这里是 re-create GeoDataFrame
的一段代码from io import StringIO #Python 3
import pandas as pd
import geopandas as gpd
df_string="""0;2.2;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]" 0;3.3;3/27/2017 17:45;-34.92967678;-62.34831333;[{lat":-34.92967677667683 lng:-62.34831333160395} {"lat":-34.93002861969753 lng:-62.360866069793644} {"lat":-34.93526211379422 lng:-62.36063016609785} {"lat":-34.93571078689853 lng:-62.35996507775451} {"lat":-34.935798629937075 lng:-62.34816312789911} {"lat":-34.9333358703344 lng:-62.34824895858759} {"lat":-34.9320340961022 lng:-62.348334789276066}]" """
df_io = StringIO(df_string)
df = pd.read_csv(df_io, sep=";", names=["col_a","col_b","col_c","lat","lon","polyline"])
gdf = gpd.GeoDataFrame(df)
结果是
gdf
col_a col_b col_c lat lon polyline
0 0 2.2 3/27/2017 17:45 -34.92967678 -62.34831333 "[{lat"":-34.92967677667683 lng:-62.34831333160395} {""lat"":-34.93002861969753 lng:-62.360866069793644} {""lat"":-34.93526211379422 lng:-62.36063016609785} {""lat"":-34.93571078689853 lng:-62.35996507775451} {""lat"":-34.935798629937075 lng:-62.34816312789911} {""lat"":-34.9333358703344 lng:-62.34824895858759} {""lat"":-34.9320340961022 lng:-62.348334789276066}]"" "
1 0 3.3 3/27/2017 17:45 -34.92967678 -62.34831333 "[{lat"":-34.92967677667683 lng:-62.34831333160395} {""lat"":-34.93002861969753 lng:-62.360866069793644} {""lat"":-34.93526211379422 lng:-62.36063016609785} {""lat"":-34.93571078689853 lng:-62.35996507775451} {""lat"":-34.935798629937075 lng:-62.34816312789911} {""lat"":-34.9333358703344 lng:-62.34824895858759} {""lat"":-34.9320340961022 lng:-62.348334789276066}]"""
然后,如果几何是 polyline 列名称所建议的直线,则应使用 Shapely LineString
方法而不是 Polygon
:
from shapely.geometry import LineString
coords = gdf.polyline \
.str.translate(str.maketrans({'"':''})) \
.str.findall(r'\blat:(-?\d+\.\d+)\s+lng:(-?\d+\.\d+)')
gdf.geometry = [ LineString([(float(x), float(y)) for x,y in e]) for e in coords ]
由于两个几何图形相同,我们可以绘制第一个:
gdf[0:1].plot()