如何在使用 AI Platform 超调参能力时强制参数依赖？

Question

我有一个 scikit-learn 模型，我可以使用 AI Platform training 在 GCP 上进行训练。我还想使用 AI Platform training 进行超参数调整。这是可能的，只需要传递一个带有参数及其范围的 YAML：

params:
- parameterName: max_df
  type: DOUBLE
  minValue: 0.0
  maxValue: 1.0
  scaleType: UNIT_LINEAR_SCALE
- parameterName: min_df
  type: DOUBLE
  minValue: 0
  maxValue: 1.
  scaleType: UNIT_LINEAR_SCALE

这里的问题是 2 个参数之间存在某种依赖关系：min_df<max_df。如果不是这种情况，scikit-learn 将按预期失败。

在 YAML 中似乎不可能表达这种依赖关系。

我可以调整失败试验的次数，但如果我运气不好，第一份工作我有 df_min>df_max，那么整个超参数调整过程将停止。这似乎不是一个有效的选择。 link doc

我可以在我的 python 代码中控制它并确保 df_min<df_max 但是我应该 return 对执行超参数调整的代码做什么（我猜是使用贝叶斯优化）所以据了解，这样的参数选择是无效的？

# this is for hyperparameter tuning
    hpt = hypertune.HyperTune()
    hpt.report_hyperparameter_tuning_metric(
        hyperparameter_metric_tag='accuracy',
        metric_value=accuracy,
        global_step=0)

仅仅 return 0.0 的准确度就足够了吗？或者我应该 return None 还是 NaN？我没有找到关于这个主题的任何文档。

奖金问题：当我使用 YAML 时，我只能传递字符串而不能传递 None 或 NULL link doc

- parameterName: FT_norm
      type: CATEGORICAL
      categoricalValues: ['l1', 'l2', 'None']

我需要在将值传递给模型之前直接在 python 代码中将 'None' 转换为 None。有没有更好的方法来处理这种情况？（我正在使用 gcloud cli）例如使用 GCP python 客户端库 ?

Answer 1

最后，当给 sciki-learn 的参数不正确时（比如当我们有 df_min>df_max).

如下所示，在无效超参数的情况下，当值 0.0 为 return 时，不会报告应计费用：

还发现代码只接受浮点数或字符串作为指标的输入，如下所示，但我没有找到更多详细解释这一点的文档：

File "/root/.local/lib/python3.5/site-packages/hypertune/hypertune.py", line 62, in report_hyperparameter_tuning_metric
    metric_value = float(metric_value)
TypeError: float() argument must be a string or a number, not 'NoneType'

我确定这不是 100% 正确，但似乎按预期工作。

如何在使用 AI Platform 超调参能力时强制参数依赖？

How to force parameter dependency when using AI Platform hyper parameter tuning capability?

python

google-cloud-platform

hyperparameters

gcp-ai-platform-training