将 google-cloud-ml github Reddit 示例从回归转换为分类并添加键?
Converting google-cloud-ml github Reddit example from regression to classification and adding keys?
我一直在努力使 reddit_tft example from the cloud-ml github samples repo 适应我的需要。
我已经按照教程 readme 得到它 运行。
但是我想用它来解决二元分类问题,并在批量预测中输出键。
所以我复制了教程代码 here 并在几个地方进行了更改,以便能够使用 DNNClasifier
的 deep_classifier
模型类型而不是 DNNRegressor
.
我已将分数变量更改为
if(score>0,1,0) as score
它训练良好,部署到云 ml,但我不确定现在如何从我的预测中取回密钥。 `
我更新了从 BigQuery 中提取的 sql 以包含 id as example_id
here
教程中的代码似乎有某种 example_id
的占位符,所以我正在尝试利用它。
这一切似乎都有效,但是当我得到批量预测时,我得到的只是 json 像这样:
{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]}
{"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]}
...
所以 example_id
似乎没有像我需要的那样进入服务功能。
我尝试遵循 的方法,该方法基于为密钥调整人口普查示例。
我只是不知道如何完成调整此 reddit 示例以在预测中也输出键,因为它们在设计和使用的功能方面对我来说看起来有点不同。
更新 1
我最近的尝试是 here Trying to use the approach outlined here。
然而这是错误的:
NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]
更新 2
我最近的尝试和详情是here。
我现在收到来自 tensorflow-fransform 的错误(run_preprocess.sh 在 tft 0.1 中工作正常)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__
self._dtype = tf.as_dtype(state['dtype'])
TypeError: string indices must be integers, not str
更新 3
我已更改为仅使用 beam + csv 并避免使用 tft。此外,我现在正在使用 here 概述的方法来扩展罐装估算器以通过预测取回密钥。
然而,当跟随 this post 尝试将评论作为功能加入时,我现在 运行 陷入了一个新的错误。
The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name
我对这个 attempt/approach 的回购是这里。如果我只使用 subreddit
作为一项功能,这一切都运行良好,它添加了似乎导致问题的 comment
功能。 103 到 111 行是我遵循 this 方法的地方。
不确定是什么触发了我的代码读取跟踪的错误。有人有什么想法吗?
或者任何人都可以向我指出另一种从文本到鞠躬再到 TF 中嵌入功能的方法吗?
我们有计划,但尚未将输出键的更改移至人口普查。与此同时,你能否看看这个要点是否有帮助https://gist.github.com/andrewm4894/ebd3ac3c87e2ab4af8a10740e85073bb#file-with_keys_model-py
如果您尽快提交,请随时发送 PR,我们将合并您的贡献。
参见:
下面是通过键传递的代码:
def forward_key_to_export(estimator):
estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)
## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
config = estimator.config
def model_fn2(features, labels, mode):
estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
if estimatorSpec.export_outputs:
for ekey in ['predict', 'serving_default']:
estimatorSpec.export_outputs[ekey] = \
tf.estimator.export.PredictOutput(estimatorSpec.predictions)
return estimatorSpec
return tf.estimator.Estimator(model_fn=model_fn2, config=config)
##
# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
estimator = tf.estimator.DNNLinearCombinedRegressor(...)
estimator = forward_key_to_export(estimator)
...
tf.estimator.train_and_evaluate(estimator, ...)
我一直在努力使 reddit_tft example from the cloud-ml github samples repo 适应我的需要。
我已经按照教程 readme 得到它 运行。
但是我想用它来解决二元分类问题,并在批量预测中输出键。
所以我复制了教程代码 here 并在几个地方进行了更改,以便能够使用 DNNClasifier
的 deep_classifier
模型类型而不是 DNNRegressor
.
我已将分数变量更改为
if(score>0,1,0) as score
它训练良好,部署到云 ml,但我不确定现在如何从我的预测中取回密钥。 `
我更新了从 BigQuery 中提取的 sql 以包含 id as example_id
here
教程中的代码似乎有某种 example_id
的占位符,所以我正在尝试利用它。
这一切似乎都有效,但是当我得到批量预测时,我得到的只是 json 像这样:
{"classes": ["0", "1"], "scores": [0.20427155494689941, 0.7957285046577454]}
{"classes": ["0", "1"], "scores": [0.14911963045597076, 0.8508803248405457]}
...
所以 example_id
似乎没有像我需要的那样进入服务功能。
我尝试遵循
我只是不知道如何完成调整此 reddit 示例以在预测中也输出键,因为它们在设计和使用的功能方面对我来说看起来有点不同。
更新 1
我最近的尝试是 here Trying to use the approach outlined here。
然而这是错误的:
NotFoundError (see above for traceback): /tmp/tmp2jllvb/model.ckpt-1_temp_9530d2c5823d4462be53fa5415e429fd; No such file or directory
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:ps/replica:0/task:0/device:CPU:0"](save/ShardedFilename, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, dnn/hiddenlayer_0/kernel/part_2/read, dnn/dnn/hiddenlayer_0/kernel/part_2/Adagrad/read, dnn/hiddenlayer_1/kernel/part_2/read, dnn/dnn/hiddenlayer_1/kernel/part_2/Adagrad/read, dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/read, dnn/dnn/input_from_feature_columns/input_layer/subreddit_id_embedding/weights/part_0/Adagrad/read, dnn/logits/bias/part_0/read, dnn/dnn/logits/bias/part_0/Adagrad/read, global_step)]]
更新 2
我最近的尝试和详情是here。
我现在收到来自 tensorflow-fransform 的错误(run_preprocess.sh 在 tft 0.1 中工作正常)
File "/usr/local/lib/python2.7/dist-packages/tensorflow_transform/tf_metadata/dataset_schema.py", line 282, in __setstate__
self._dtype = tf.as_dtype(state['dtype'])
TypeError: string indices must be integers, not str
更新 3
我已更改为仅使用 beam + csv 并避免使用 tft。此外,我现在正在使用 here 概述的方法来扩展罐装估算器以通过预测取回密钥。
然而,当跟随 this post 尝试将评论作为功能加入时,我现在 运行 陷入了一个新的错误。
The replica worker 3 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): [...] File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/estimator/python/estimator/extenders.py", line 87, in new_model_fn spec = estimator.model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 203, in public_model_fn return self._call_model_fn(features, labels, mode, config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 694, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 520, in _model_fn config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn_linear_combined.py", line 158, in _dnn_linear_combined_model_fn dnn_logits = dnn_logit_fn(features=features, mode=mode) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/dnn.py", line 89, in dnn_logit_fn features=features, feature_columns=feature_columns) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/feature_column/feature_column.py", line 226, in input_layer with variable_scope.variable_scope(None, default_name=column.name): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1826, in __enter__ current_name_scope_name = self._current_name_scope.__enter__() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 4932, in __enter__ return self._name_scope.__enter__() File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3514, in name_scope raise ValueError("'%s' is not a valid scope name" % name) ValueError: 'Tensor("Slice:0", shape=(?, 20), dtype=int64)_embedding' is not a valid scope name
我对这个 attempt/approach 的回购是这里。如果我只使用 subreddit
作为一项功能,这一切都运行良好,它添加了似乎导致问题的 comment
功能。 103 到 111 行是我遵循 this 方法的地方。
不确定是什么触发了我的代码读取跟踪的错误。有人有什么想法吗?
或者任何人都可以向我指出另一种从文本到鞠躬再到 TF 中嵌入功能的方法吗?
我们有计划,但尚未将输出键的更改移至人口普查。与此同时,你能否看看这个要点是否有帮助https://gist.github.com/andrewm4894/ebd3ac3c87e2ab4af8a10740e85073bb#file-with_keys_model-py
如果您尽快提交,请随时发送 PR,我们将合并您的贡献。
参见:
下面是通过键传递的代码:
def forward_key_to_export(estimator):
estimator = tf.contrib.estimator.forward_features(estimator, KEY_COLUMN)
## This shouldn't be necessary (I've filed CL/187793590 to update extenders.py with this code)
config = estimator.config
def model_fn2(features, labels, mode):
estimatorSpec = estimator._call_model_fn(features, labels, mode, config=config)
if estimatorSpec.export_outputs:
for ekey in ['predict', 'serving_default']:
estimatorSpec.export_outputs[ekey] = \
tf.estimator.export.PredictOutput(estimatorSpec.predictions)
return estimatorSpec
return tf.estimator.Estimator(model_fn=model_fn2, config=config)
##
# Create estimator to train and evaluate
def train_and_evaluate(output_dir):
estimator = tf.estimator.DNNLinearCombinedRegressor(...)
estimator = forward_key_to_export(estimator)
...
tf.estimator.train_and_evaluate(estimator, ...)