PySpark - 如何显示管道中包含哪些组件?
PySpark - How to show what components are included in a Pipeline?
在下面的代码中,PySpark 管道包含两个转换器。如何在给定管道的情况下打印出这两个变压器的名称?
from pyspark.ml.feature import (StringIndexer, OneHotEncoder)
from pyspark.ml import Pipeline
gender_indexer = StringIndexer(inputCol = 'Sex', outputCol = 'SexIndex')
gender_encoder = OneHotEncoder(inputCol='SexIndex', outputCol = 'SexVec')
pipeline = Pipeline(stages = [gender_indexer, gender_encoder])
pipeline.getStages()
将向您展示管道中的阶段:
>>> pipeline.getStages()
[StringIndexer_84633f93b8f6, OneHotEncoder_6a01b7a7cdc1]
请注意,每个列表元素都是一个对象,而不是字符串。
在下面的代码中,PySpark 管道包含两个转换器。如何在给定管道的情况下打印出这两个变压器的名称?
from pyspark.ml.feature import (StringIndexer, OneHotEncoder)
from pyspark.ml import Pipeline
gender_indexer = StringIndexer(inputCol = 'Sex', outputCol = 'SexIndex')
gender_encoder = OneHotEncoder(inputCol='SexIndex', outputCol = 'SexVec')
pipeline = Pipeline(stages = [gender_indexer, gender_encoder])
pipeline.getStages()
将向您展示管道中的阶段:
>>> pipeline.getStages()
[StringIndexer_84633f93b8f6, OneHotEncoder_6a01b7a7cdc1]
请注意,每个列表元素都是一个对象,而不是字符串。