将 google-cloud-ml github Reddit 示例从回归转换为分类并添加键？

Question

我一直在努力使 reddit_tft example from the cloud-ml github samples repo 适应我的需要。

我已经按照教程 readme 得到它运行。

但是我想用它来解决二元分类问题，并在批量预测中输出键。

所以我复制了教程代码 here 并在几个地方进行了更改，以便能够使用 DNNClasifier 的 deep_classifier 模型类型而不是 DNNRegressor.

我已将分数变量更改为

if(score>0,1,0) as score

它训练良好，部署到云 ml，但我不确定现在如何从我的预测中取回密钥。 `

我更新了从 BigQuery 中提取的 sql 以包含 id as example_id here

教程中的代码似乎有某种 example_id 的占位符，所以我正在尝试利用它。

这一切似乎都有效，但是当我得到批量预测时，我得到的只是 json 像这样：

{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]} {"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]} ...

所以 example_id 似乎没有像我需要的那样进入服务功能。

我尝试遵循的方法，该方法基于为密钥调整人口普查示例。

我只是不知道如何完成调整此 reddit 示例以在预测中也输出键，因为它们在设计和使用的功能方面对我来说看起来有点不同。

更新 1

我最近的尝试是 here Trying to use the approach outlined here。

然而这是错误的：

NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
     [[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]

更新 2

我最近的尝试和详情是here。

我现在收到来自 tensorflow-fransform 的错误（run_preprocess.sh 在 tft 0.1 中工作正常）

File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__ self._dtype = tf.as_dtype(state['dtype']) TypeError: string indices must be integers, not str

更新 3

我已更改为仅使用 beam + csv 并避免使用 tft。此外，我现在正在使用 here 概述的方法来扩展罐装估算器以通过预测取回密钥。

然而，当跟随 this post 尝试将评论作为功能加入时，我现在运行陷入了一个新的错误。

The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name

我对这个 attempt/approach 的回购是这里。如果我只使用 subreddit 作为一项功能，这一切都运行良好，它添加了似乎导致问题的 comment 功能。 103 到 111 行是我遵循 this 方法的地方。

不确定是什么触发了我的代码读取跟踪的错误。有人有什么想法吗？

或者任何人都可以向我指出另一种从文本到鞠躬再到 TF 中嵌入功能的方法吗？

Answer 1

我们有计划，但尚未将输出键的更改移至人口普查。与此同时，你能否看看这个要点是否有帮助https://gist.github.com/andrewm4894/ebd3ac3c87e2ab4af8a10740e85073bb#file-with_keys_model-py

如果您尽快提交，请随时发送 PR，我们将合并您的贡献。

Answer 2

参见：

https://medium.com/@lakshmanok/how-to-extend-a-canned-tensorflow-estimator-to-add-more-evaluation-metrics-and-to-pass-through-ddf66cd3047d

下面是通过键传递的代码：

def forward_key_to_export(estimator):
    estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)

    ## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
    config = estimator.config
    def model_fn2(features, labels, mode):
      estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
      if estimatorSpec.export_outputs:
        for ekey in ['predict', 'serving_default']:
          estimatorSpec.export_outputs[ekey] = \
            tf.estimator.export.PredictOutput(estimatorSpec.predictions)
      return estimatorSpec
    return tf.estimator.Estimator(model_fn=model_fn2, config=config)
    ##

# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
    estimator = tf.estimator.DNNLinearCombinedRegressor(...)
    estimator = forward_key_to_export(estimator)
    ...
    tf.estimator.train_and_evaluate(estimator, ...)

将 google-cloud-ml github Reddit 示例从回归转换为分类并添加键？

Converting google-cloud-ml github Reddit example from regression to classification and adding keys?

tensorflow

google-cloud-ml

更新 1

更新 2

更新 3