将数据框转换为行列表 pyspark 胶水

convert dataframe to list of rows pyspark glue

如何将我的数据框 df 转换为行列表?

代码

df = glueContext.create_dynamic_frame_from_options(
    connection_type = "s3",
    connection_options = {"paths": ["s3://data/tmp1/file.csv"]},
    format = "csv",
)
df = df.toDF()
list = df.values.tolist()

错误

dataframe has no attribute values

恕我直言,您可以使用 toPandas()

df = glueContext.create_dynamic_frame_from_options(
    connection_type="s3", 
    connection_options={"paths": ["s3://data/tmp1/file.csv"]}, 
    format="csv")

df = df.toPandas()
liste = df.values.tolist()

在glue中,你可以使用DyanamicFrame.map()方法(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html#aws-glue-api-crawler-pyspark-extensions-dynamic-frame-map)

df.map(to_list)
def to_list(rec):
       rec["list"] = [rec["col1"], rec["col2"] ]
       del rec["col1"]
       del rec["col2"]