我可以在 spark ML Pipelines 中设置舞台名称吗?
Can I set stage names in spark ML Pipelines?
我开始创建更复杂的 ML 管道并多次使用相同类型的管道阶段。有没有办法设置阶段的名称,以便其他人可以轻松地查询保存的管道并找出发生了什么?例如
vecAssembler1 = VectorAssembler(inputCols = ["P1", "P2"], outputCol="features1")
vecAssembler2 = VectorAssembler(inputCols = ["P3", "P4"], outputCol="features2")
lr_1 = LogisticRegression(labelCol = "L1")
lr_2 = LogisticRegression(labelCol = "L2")
pipeline = Pipeline(stages=[vecAssembler1, vecAssembler2, lr_1, lr_2])
print pipeline.stages
这将 return 像这样:
[VectorAssembler_4205a9d090177e9c54ba, VectorAssembler_42b8aa29277b380a8513, LogisticRegression_42d78f81ae072747f88d, LogisticRegression_4d4dae2729edc37dc1f3]
但我想做的是:
pipeline = Pipeline(stages=[vecAssembler1, vecAssembler2, lr_1, lr_2], names=["VectorAssembler for predicting L1","VectorAssembler for predicting L1","LogisticRegression for L1","LogisticRegression for L2")
以便第三方可以加载已保存的管道模型,并且他们会得到很好的描述:
print pipeline.stages
# [VectorAssembler for predicting L1,VectorAssembler for predicting L2,LogisticRegression for L1,LogisticRegression for L2]
您可以使用 _resetUid 方法重命名每个 transformer/estimator:
vecAssembler1 = VectorAssembler(inputCols = ["P1", "P2"], outputCol="features1")
vecAssembler1._resetUid("VectorAssembler for predicting L1")
默认使用java的UID随机生成器。
我开始创建更复杂的 ML 管道并多次使用相同类型的管道阶段。有没有办法设置阶段的名称,以便其他人可以轻松地查询保存的管道并找出发生了什么?例如
vecAssembler1 = VectorAssembler(inputCols = ["P1", "P2"], outputCol="features1")
vecAssembler2 = VectorAssembler(inputCols = ["P3", "P4"], outputCol="features2")
lr_1 = LogisticRegression(labelCol = "L1")
lr_2 = LogisticRegression(labelCol = "L2")
pipeline = Pipeline(stages=[vecAssembler1, vecAssembler2, lr_1, lr_2])
print pipeline.stages
这将 return 像这样:
[VectorAssembler_4205a9d090177e9c54ba, VectorAssembler_42b8aa29277b380a8513, LogisticRegression_42d78f81ae072747f88d, LogisticRegression_4d4dae2729edc37dc1f3]
但我想做的是:
pipeline = Pipeline(stages=[vecAssembler1, vecAssembler2, lr_1, lr_2], names=["VectorAssembler for predicting L1","VectorAssembler for predicting L1","LogisticRegression for L1","LogisticRegression for L2")
以便第三方可以加载已保存的管道模型,并且他们会得到很好的描述:
print pipeline.stages
# [VectorAssembler for predicting L1,VectorAssembler for predicting L2,LogisticRegression for L1,LogisticRegression for L2]
您可以使用 _resetUid 方法重命名每个 transformer/estimator:
vecAssembler1 = VectorAssembler(inputCols = ["P1", "P2"], outputCol="features1")
vecAssembler1._resetUid("VectorAssembler for predicting L1")
默认使用java的UID随机生成器。