PySpark - 如何显示管道中包含哪些组件?

PySpark - How to show what components are included in a Pipeline?

在下面的代码中,PySpark 管道包含两个转换器。如何在给定管道的情况下打印出这两个变压器的名称?

from pyspark.ml.feature import (StringIndexer, OneHotEncoder)
from pyspark.ml import Pipeline
gender_indexer = StringIndexer(inputCol = 'Sex', outputCol = 'SexIndex')
gender_encoder = OneHotEncoder(inputCol='SexIndex', outputCol = 'SexVec')

pipeline = Pipeline(stages = [gender_indexer, gender_encoder])

pipeline.getStages() 将向您展示管道中的阶段:

>>> pipeline.getStages()
[StringIndexer_84633f93b8f6, OneHotEncoder_6a01b7a7cdc1]

请注意,每个列表元素都是一个对象,而不是字符串。