从泡菜中读取时,数据框被解析为元组

dataframe parsed as a tuple, when read from a pickle

我有一个 pickle 文件,其中包含一个数据帧字典。作为数据清理脚本的一部分,我加载了这个 pickle 并对一些但不是所有数据帧进行了额外的处理,然后覆盖了 pickle 以供稍后由模拟程序拾取和加载。

当我在此处理后读取 pickle 时,除了两个值之外的所有值都被正确解包并解析为数据帧,但这两个值被读取为元组。由于这两个实际上不需要在此特定数据清理脚本中进行任何更改,因此除了以下内容之外,脚本不会处理它们:

#start of script, read in the pickle assign the dfs for later use.
input_file = sys.argv[1]
with open(input_file, 'rb') as handle:
  data = pickle.load(handle)


trips      = data['trips']       # this sees additional processing, is correctly written out as a DF. 
stops      = data['stops']       # this sees additional processing, is correctly written out as a DF.
stop_times = data['stop_times'], # NO additional processing, is INCORRECTLY written out as a tuple.
road_segs  = data['road_segs'],  # NO additional processing, is INCORRECTLY written out as a tuple.
seg_props  = data['seg_props']   # NO additional processing, is correctly written out as a df.


... # do additional processing on trips and stops


#Output the update DFs and carry the unaltered DFs through to overwrite the original pickle.

data = {
  "trips":      trips,
  "stops":      stops,
  "stop_times": stop_times,
  "road_segs":  road_segs,
  "seg_props":  seg_props
}

with open(input_file, 'wb') as handle:
  pickle.dump(data, handle, protocol=4)

如果我在 运行 通过此脚本阅读 pickle 之前,我会得到以下信息。

[type(val) for val in gtfs.values()]                                                                                                                                                    
#output
[pandas.core.frame.DataFrame,
 geopandas.geodataframe.GeoDataFrame,
 pandas.core.frame.DataFrame,
 pandas.core.frame.DataFrame,
 pandas.core.frame.DataFrame]

及之后:

[type(val) for val in gtfs.values()]                                                                                                                                                    
Out[17]: 
[pandas.core.frame.DataFrame,
 pandas.core.frame.DataFrame,
 tuple,
 tuple,
 pandas.core.frame.DataFrame]

这些元组也是高度嵌套的:

(((                                   trip_id stop_id  stop_duation
   0        15243854-AUG19-MVS-BUS-Weekday-01   17894           0.0
   1        15243854-AUG19-MVS-BUS-Weekday-01   17897           0.0
   2        15243854-AUG19-MVS-BUS-Weekday-01   17900           0.0

   [2812369 rows x 3 columns],),),)

我有两个悬空逗号

stop_times = data['stop_times'],
road_segs  = data['road_segs'],

在我的导入中,这是造成这种情况的原因。一遍又一遍地盯着它看后,我怎么没注意到这一点,这超出了我的范围。