从泡菜中读取时,数据框被解析为元组
dataframe parsed as a tuple, when read from a pickle
我有一个 pickle 文件,其中包含一个数据帧字典。作为数据清理脚本的一部分,我加载了这个 pickle 并对一些但不是所有数据帧进行了额外的处理,然后覆盖了 pickle 以供稍后由模拟程序拾取和加载。
当我在此处理后读取 pickle 时,除了两个值之外的所有值都被正确解包并解析为数据帧,但这两个值被读取为元组。由于这两个实际上不需要在此特定数据清理脚本中进行任何更改,因此除了以下内容之外,脚本不会处理它们:
#start of script, read in the pickle assign the dfs for later use.
input_file = sys.argv[1]
with open(input_file, 'rb') as handle:
data = pickle.load(handle)
trips = data['trips'] # this sees additional processing, is correctly written out as a DF.
stops = data['stops'] # this sees additional processing, is correctly written out as a DF.
stop_times = data['stop_times'], # NO additional processing, is INCORRECTLY written out as a tuple.
road_segs = data['road_segs'], # NO additional processing, is INCORRECTLY written out as a tuple.
seg_props = data['seg_props'] # NO additional processing, is correctly written out as a df.
... # do additional processing on trips and stops
#Output the update DFs and carry the unaltered DFs through to overwrite the original pickle.
data = {
"trips": trips,
"stops": stops,
"stop_times": stop_times,
"road_segs": road_segs,
"seg_props": seg_props
}
with open(input_file, 'wb') as handle:
pickle.dump(data, handle, protocol=4)
如果我在 运行 通过此脚本阅读 pickle 之前,我会得到以下信息。
[type(val) for val in gtfs.values()]
#output
[pandas.core.frame.DataFrame,
geopandas.geodataframe.GeoDataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame]
及之后:
[type(val) for val in gtfs.values()]
Out[17]:
[pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame,
tuple,
tuple,
pandas.core.frame.DataFrame]
这些元组也是高度嵌套的:
((( trip_id stop_id stop_duation
0 15243854-AUG19-MVS-BUS-Weekday-01 17894 0.0
1 15243854-AUG19-MVS-BUS-Weekday-01 17897 0.0
2 15243854-AUG19-MVS-BUS-Weekday-01 17900 0.0
[2812369 rows x 3 columns],),),)
我有两个悬空逗号
stop_times = data['stop_times'],
road_segs = data['road_segs'],
在我的导入中,这是造成这种情况的原因。一遍又一遍地盯着它看后,我怎么没注意到这一点,这超出了我的范围。
我有一个 pickle 文件,其中包含一个数据帧字典。作为数据清理脚本的一部分,我加载了这个 pickle 并对一些但不是所有数据帧进行了额外的处理,然后覆盖了 pickle 以供稍后由模拟程序拾取和加载。
当我在此处理后读取 pickle 时,除了两个值之外的所有值都被正确解包并解析为数据帧,但这两个值被读取为元组。由于这两个实际上不需要在此特定数据清理脚本中进行任何更改,因此除了以下内容之外,脚本不会处理它们:
#start of script, read in the pickle assign the dfs for later use.
input_file = sys.argv[1]
with open(input_file, 'rb') as handle:
data = pickle.load(handle)
trips = data['trips'] # this sees additional processing, is correctly written out as a DF.
stops = data['stops'] # this sees additional processing, is correctly written out as a DF.
stop_times = data['stop_times'], # NO additional processing, is INCORRECTLY written out as a tuple.
road_segs = data['road_segs'], # NO additional processing, is INCORRECTLY written out as a tuple.
seg_props = data['seg_props'] # NO additional processing, is correctly written out as a df.
... # do additional processing on trips and stops
#Output the update DFs and carry the unaltered DFs through to overwrite the original pickle.
data = {
"trips": trips,
"stops": stops,
"stop_times": stop_times,
"road_segs": road_segs,
"seg_props": seg_props
}
with open(input_file, 'wb') as handle:
pickle.dump(data, handle, protocol=4)
如果我在 运行 通过此脚本阅读 pickle 之前,我会得到以下信息。
[type(val) for val in gtfs.values()]
#output
[pandas.core.frame.DataFrame,
geopandas.geodataframe.GeoDataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame]
及之后:
[type(val) for val in gtfs.values()]
Out[17]:
[pandas.core.frame.DataFrame,
pandas.core.frame.DataFrame,
tuple,
tuple,
pandas.core.frame.DataFrame]
这些元组也是高度嵌套的:
((( trip_id stop_id stop_duation
0 15243854-AUG19-MVS-BUS-Weekday-01 17894 0.0
1 15243854-AUG19-MVS-BUS-Weekday-01 17897 0.0
2 15243854-AUG19-MVS-BUS-Weekday-01 17900 0.0
[2812369 rows x 3 columns],),),)
我有两个悬空逗号
stop_times = data['stop_times'],
road_segs = data['road_segs'],
在我的导入中,这是造成这种情况的原因。一遍又一遍地盯着它看后,我怎么没注意到这一点,这超出了我的范围。