在 Tensorflow 中使用字符串标签

Question

我仍在尝试运行 Tensorflow 使用自己的图像数据。我能够使用此示例 link

中的 conevert_to() 函数创建一个 .tfrecords 文件

现在我想使用该示例中的代码训练网络 link。

但是在read_and_decode()函数中失败了。我对该功能的更改是：

label = tf.decode_raw(features['label'], tf.string)

错误是：

TypeError: DataType string for attr 'out_type' not in list of allowed values: float32, float64, int32, uint8, int16, int8, int64

那么如何在 tensorflow 中 1) 读取和 2) 使用字符串标签进行训练。

Answer 1

convert_to_records.py script creates a .tfrecords file in which each record is an Example protocol buffer. That protocol buffer supports string features using the bytes_list kind.

tf.decode_raw op is used to parse binary strings into image data; it is not designed to parse string (textual) labels. Assuming that features['label'] is a tf.string tensor, you can use the tf.string_to_number 操作将其转换为数字。 TensorFlow 程序内部对字符串处理的其他支持有限，因此如果您需要执行一些更复杂的函数将字符串标签转换为整数，您应该在 Python 的修改版本中执行此转换 convert_to_tensor.py.

Answer 2

要添加到@mrry 的答案中，假设您的字符串是 ascii，您可以：

def _bytes_feature(value):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def write_proto(cls, filepath, ..., item_id): # itemid is an ascii encodable string
    # ...
    with tf.python_io.TFRecordWriter(filepath) as writer:
        example = tf.train.Example(features=tf.train.Features(feature={
             # write it as a bytes array, supposing your string is `ascii`
            'item_id': _bytes_feature(bytes(item_id, encoding='ascii')), # python 3
            # ...
        }))
        writer.write(example.SerializeToString())

然后：

def parse_single_example(cls, example_proto, graph=None):
    features_dict = tf.parse_single_example(example_proto,
        features={'item_id': tf.FixedLenFeature([], tf.string),
        # ...
        })
    # decode as uint8 aka bytes
    instance.item_id = tf.decode_raw(features_dict['item_id'], tf.uint8)

然后当您在会话中取回它时，转换回字符串：

item_id, ... = session.run(your_tfrecords_iterator.get_next())
print(str(item_id.flatten(), 'ascii')) # python 3

我从这个中学到了 uint8 技巧。适合我，但 comments/improvements 欢迎。

在 Tensorflow 中使用字符串标签

Using string labels in Tensorflow

python

labels

tensorflow