附加 GeoDataFrames 不是 return 预期的数据帧
Appending GeoDataFrames does not return expected dataframe
我在尝试附加包含几何类型的数据帧时遇到以下问题。我正在查看的 pandas 数据框如下所示:
name x_zone y_zone
0 A1 65.422080 48.147850
1 A1 46.635708 51.165745
2 A1 46.597984 47.657444
3 A1 68.477700 44.073700
4 A3 46.635708 54.108190
5 A3 46.635708 51.844770
6 A3 63.309560 48.826878
7 A3 62.215572 54.108190
如您所见,每个 name
有四行,因为它们代表多边形的角。我需要它采用 geopandas 中定义的多边形形式,即我需要 GeoDataFrame
。为此,我将以下代码仅用于 name
之一(只是为了检查它是否有效):
df = df[df['name']=='A1']
x = df['x_zone'].to_list()
y = df['y_zone'].to_list()
polygon_geom = Polygon(zip(x, y))
crs = {'init': "EPSG:4326"}
polygon = gpd.GeoDataFrame(index=[name], crs=crs, geometry=[polygon_geom])
print(polygon)
哪个returns:
geometry
A1 POLYGON ((65.42208 48.14785, 46.63571 51.16575...
polygon.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 1 entries, A1 to A1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geometry 1 non-null geometry
dtypes: geometry(1)
memory usage: 16.0+ bytes
太好了,太好了。因此,对于更多 name
,我认为以下方法可行:
unique_place = list(df['name'].unique())
GE = []
for name in unique_aisle:
f = df[df['id']==name]
x = f['x_zone'].to_list()
y = f['y_zone'].to_list()
polygon_geom = Polygon(zip(x, y))
crs = {'init': "EPSG:4326"}
polygon = gpd.GeoDataFrame(index=[name], crs=crs, geometry=[polygon_geom])
print(polygon.info())
GE.append(polygon)
但它 returns 是一个列表,而不是数据框。
[ geometry
A1 POLYGON ((65.42208 48.14785, 46.63571 51.16575...,
geometry
A3 POLYGON ((46.63571 54.10819, 46.63571 51.84477...]
这很奇怪,因为如果要附加的是 pandas 数据帧,*.append(**)
工作得很好。
我错过了什么?此外,即使在第一种情况下,我只剩下几何列,但这不是问题,因为我可以将文件写入 shp
并再次读取它以获得第二列(名称)。
感谢任何能让我前进的解决方案!
我想您需要一个在您的数据上使用 groupby
的示例代码。如果不是这样请告诉我。
from io import StringIO
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
import numpy as np
dats_str = """index id x_zone y_zone
0 A1 65.422080 48.147850
1 A1 46.635708 51.165745
2 A1 46.597984 47.657444
3 A1 68.477700 44.073700
4 A3 46.635708 54.108190
5 A3 46.635708 51.844770
6 A3 63.309560 48.826878
7 A3 62.215572 54.108190"""
# read the string, convert to dataframe
df1 = pd.read_csv(StringIO(dats_str), sep='\s+', index_col='index')
# Use groupBy as an iterator to:-
# - collect interested items
# - process some data: mean, creat Polygon, maybe others
# - all are collected/appended as lists
ids = []
counts = []
meanx = []
meany = []
list_x = []
list_y = []
polygon = []
for label, group in df1.groupby('id'):
# label: 'A1', 'A3';
# group: dataframe of 'A', of 'B'
ids.append(label)
counts.append(len(group)) #number of rows
meanx.append(group.x_zone.mean())
meany.append(group.y_zone.mean())
# process x,y data of this group -> for polygon
xs = group.x_zone.values
ys = group.y_zone.values
list_x.append(xs)
list_y.append(ys)
polygon.append(Polygon(zip(xs, ys))) # make/collect polygon
# items above are used to create a dataframe here
df_from_groupby = pd.DataFrame({'id': ids, 'counts': counts, \
'meanx': meanx, "meany": meany, \
'list_x': list_x, 'list_y': list_y,
'polygon': polygon
})
如果你打印数据框df_from_groupby
,你会得到:-
id counts meanx meany \
0 A1 4 56.783368 47.761185
1 A3 4 54.699137 52.222007
list_x \
0 [65.42208, 46.635708, 46.597984, 68.4777]
1 [46.635708, 46.635708, 63.30956, 62.215572]
list_y \
0 [48.14785, 51.165745, 47.657444, 44.0737]
1 [54.10819, 51.84477, 48.826878, 54.10819]
polygon
0 POLYGON ((65.42207999999999 48.14785, 46.63570...
1 POLYGON ((46.635708 54.10819, 46.635708 51.844...
我在尝试附加包含几何类型的数据帧时遇到以下问题。我正在查看的 pandas 数据框如下所示:
name x_zone y_zone
0 A1 65.422080 48.147850
1 A1 46.635708 51.165745
2 A1 46.597984 47.657444
3 A1 68.477700 44.073700
4 A3 46.635708 54.108190
5 A3 46.635708 51.844770
6 A3 63.309560 48.826878
7 A3 62.215572 54.108190
如您所见,每个 name
有四行,因为它们代表多边形的角。我需要它采用 geopandas 中定义的多边形形式,即我需要 GeoDataFrame
。为此,我将以下代码仅用于 name
之一(只是为了检查它是否有效):
df = df[df['name']=='A1']
x = df['x_zone'].to_list()
y = df['y_zone'].to_list()
polygon_geom = Polygon(zip(x, y))
crs = {'init': "EPSG:4326"}
polygon = gpd.GeoDataFrame(index=[name], crs=crs, geometry=[polygon_geom])
print(polygon)
哪个returns:
geometry
A1 POLYGON ((65.42208 48.14785, 46.63571 51.16575...
polygon.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 1 entries, A1 to A1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 geometry 1 non-null geometry
dtypes: geometry(1)
memory usage: 16.0+ bytes
太好了,太好了。因此,对于更多 name
,我认为以下方法可行:
unique_place = list(df['name'].unique())
GE = []
for name in unique_aisle:
f = df[df['id']==name]
x = f['x_zone'].to_list()
y = f['y_zone'].to_list()
polygon_geom = Polygon(zip(x, y))
crs = {'init': "EPSG:4326"}
polygon = gpd.GeoDataFrame(index=[name], crs=crs, geometry=[polygon_geom])
print(polygon.info())
GE.append(polygon)
但它 returns 是一个列表,而不是数据框。
[ geometry
A1 POLYGON ((65.42208 48.14785, 46.63571 51.16575...,
geometry
A3 POLYGON ((46.63571 54.10819, 46.63571 51.84477...]
这很奇怪,因为如果要附加的是 pandas 数据帧,*.append(**)
工作得很好。
我错过了什么?此外,即使在第一种情况下,我只剩下几何列,但这不是问题,因为我可以将文件写入 shp
并再次读取它以获得第二列(名称)。
感谢任何能让我前进的解决方案!
我想您需要一个在您的数据上使用 groupby
的示例代码。如果不是这样请告诉我。
from io import StringIO
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
import numpy as np
dats_str = """index id x_zone y_zone
0 A1 65.422080 48.147850
1 A1 46.635708 51.165745
2 A1 46.597984 47.657444
3 A1 68.477700 44.073700
4 A3 46.635708 54.108190
5 A3 46.635708 51.844770
6 A3 63.309560 48.826878
7 A3 62.215572 54.108190"""
# read the string, convert to dataframe
df1 = pd.read_csv(StringIO(dats_str), sep='\s+', index_col='index')
# Use groupBy as an iterator to:-
# - collect interested items
# - process some data: mean, creat Polygon, maybe others
# - all are collected/appended as lists
ids = []
counts = []
meanx = []
meany = []
list_x = []
list_y = []
polygon = []
for label, group in df1.groupby('id'):
# label: 'A1', 'A3';
# group: dataframe of 'A', of 'B'
ids.append(label)
counts.append(len(group)) #number of rows
meanx.append(group.x_zone.mean())
meany.append(group.y_zone.mean())
# process x,y data of this group -> for polygon
xs = group.x_zone.values
ys = group.y_zone.values
list_x.append(xs)
list_y.append(ys)
polygon.append(Polygon(zip(xs, ys))) # make/collect polygon
# items above are used to create a dataframe here
df_from_groupby = pd.DataFrame({'id': ids, 'counts': counts, \
'meanx': meanx, "meany": meany, \
'list_x': list_x, 'list_y': list_y,
'polygon': polygon
})
如果你打印数据框df_from_groupby
,你会得到:-
id counts meanx meany \
0 A1 4 56.783368 47.761185
1 A3 4 54.699137 52.222007
list_x \
0 [65.42208, 46.635708, 46.597984, 68.4777]
1 [46.635708, 46.635708, 63.30956, 62.215572]
list_y \
0 [48.14785, 51.165745, 47.657444, 44.0737]
1 [54.10819, 51.84477, 48.826878, 54.10819]
polygon
0 POLYGON ((65.42207999999999 48.14785, 46.63570...
1 POLYGON ((46.635708 54.10819, 46.635708 51.844...