无法将 StringIndexer 作为列表传递到模型管道阶段
Not able to pass StringIndexer as list to the model pipeline stage
PySpark 管道对我来说很新。我正在尝试通过以下列表在管道中创建阶段:
pipeline = Pipeline().setStages([indexer,assembler,dtc_model])
我在多列上应用特征索引的地方:
cat_col = ['Gender','Habit','Mode']
indexer = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(training_data_0) for column in cat_col ]
在 运行 管道上安装时出现以下错误:
model_pipeline = pipeline.fit(train_df)
我们如何将列表传递到舞台或任何解决方法来实现这个或更好的方法?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<command-3999694668013877> in <module>
----> 1 model_pipeline = pipeline.fit(train_df)
/databricks/spark/python/pyspark/ml/base.py in fit(self, dataset, params)
130 return self.copy(params)._fit(dataset)
131 else:
--> 132 return self._fit(dataset)
133 else:
134 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
/databricks/spark/python/pyspark/ml/pipeline.py in _fit(self, dataset)
95 if not (isinstance(stage, Estimator) or isinstance(stage, Transformer)):
96 raise TypeError(
---> 97 "Cannot recognize a pipeline stage of type %s." % type(stage))
98 indexOfLastEstimator = -1
99 for i, stage in enumerate(stages):
TypeError: Cannot recognize a pipeline stage of type <class 'list'>.```
试试下面-
cat_col = ['Gender','Habit','Mode']
indexer = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(training_data_0) for column in cat_col ]
assembler = VectorAssembler...
dtc_model = DecisionTreeClassifier...
# Create pipeline using transformers and estimators
stages = indexer
stages.append(assembler)
stages.append(dtc_model)
pipeline = Pipeline().setStages(stages)
model_pipeline = pipeline.fit(train_df)
PySpark 管道对我来说很新。我正在尝试通过以下列表在管道中创建阶段:
pipeline = Pipeline().setStages([indexer,assembler,dtc_model])
我在多列上应用特征索引的地方:
cat_col = ['Gender','Habit','Mode']
indexer = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(training_data_0) for column in cat_col ]
在 运行 管道上安装时出现以下错误:
model_pipeline = pipeline.fit(train_df)
我们如何将列表传递到舞台或任何解决方法来实现这个或更好的方法?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<command-3999694668013877> in <module>
----> 1 model_pipeline = pipeline.fit(train_df)
/databricks/spark/python/pyspark/ml/base.py in fit(self, dataset, params)
130 return self.copy(params)._fit(dataset)
131 else:
--> 132 return self._fit(dataset)
133 else:
134 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
/databricks/spark/python/pyspark/ml/pipeline.py in _fit(self, dataset)
95 if not (isinstance(stage, Estimator) or isinstance(stage, Transformer)):
96 raise TypeError(
---> 97 "Cannot recognize a pipeline stage of type %s." % type(stage))
98 indexOfLastEstimator = -1
99 for i, stage in enumerate(stages):
TypeError: Cannot recognize a pipeline stage of type <class 'list'>.```
试试下面-
cat_col = ['Gender','Habit','Mode']
indexer = [StringIndexer(inputCol=column, outputCol=column+"_index").fit(training_data_0) for column in cat_col ]
assembler = VectorAssembler...
dtc_model = DecisionTreeClassifier...
# Create pipeline using transformers and estimators
stages = indexer
stages.append(assembler)
stages.append(dtc_model)
pipeline = Pipeline().setStages(stages)
model_pipeline = pipeline.fit(train_df)