运行这个例子的第一步错误：TensorFlowOnSpark on a Spark Standalone cluster

Question

我对运行这个例子有疑问 TensorFlowOnSpark on a Spark Standalone cluster (Single Host):

执行mnist_data_setup.py文件后，它正确提取了MNIST zip文件。但是通过调用 extract_images(filename) 函数，它会遇到错误。请查看以下错误：

 Extracting <open file 'FILE_PATH_IN_MT_PC/train-images-idx3-ubyte.gz', mode 'rb' at 0x7ff3423e5c00>    
Traceback (most recent call last):
  File "FILE_PATH_IN_MT_PC/mnist/mnist_data_setup.py", line 144, in <module>
    writeMNIST(sc, "FILE_PATH_IN_MT_PC/train-images-idx3-ubyte.gz", "FILE_PATH_IN_MT_PC/train-labels-idx1-ubyte.gz", args.output + "/train", args.format, args.num_partitions)
  File "/FILE_PATH_IN_MT_PC/mnist/mnist_data_setup.py", line 52, in writeMNIST
    images = numpy.array(mnist.extract_images(f))
  File "FILE_PATH_IN_MT_PC/tensorflow/contrib/learn/python/learn/datasets/mnist.py", line 42, in extract_images
    with tf.gfile.Open(filename, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:
  File "FILE_PATH_IN_MT_PC/tensorflow/python/platform/gfile.py", line 452, in Open
    return GFile(name, mode=mode)
  File "FILE_PATH_IN_MT_PC/tensorflow/python/platform/gfile.py", line 215, in __init__
    super(GFile, self).__init__(name, mode, _Pythonlocker())
  File "FILE_PATH_IN_MT_PC/tensorflow/python/platform/gfile.py", line 63, in __init__
    self._fp = open(name, mode)
TypeError: coercing to Unicode: need string or buffer, file found

如果有人帮助我找到解决方案，我会很高兴。提前致谢

Answer 1

我认为在 open 中，您为 name 变量提供了 file 类型对象而不是 string。

我挖得更多：

在images = numpy.array(mnist.extract_images(f))中，f是一个文件对象。

但是 with tf.gfile.Open(filename, 'rb') as f, gzip.GzipFile(fileobj=f) as bytestream:，这会将 images = numpy.array(mnist.extract_images(f)) 传递的参数视为文件名。

这个行为在最新版本中没有出现： https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/datasets/mnist.py

运行这个例子的第一步错误：TensorFlowOnSpark on a Spark Standalone cluster

Error in the first step of running this example: TensorFlowOnSpark on a Spark Standalone cluster

mnist

apache-spark

tensorflow

运行 这个例子的第一步错误：TensorFlowOnSpark on a Spark Standalone cluster

Error in the first step of running this example: TensorFlowOnSpark on a Spark Standalone cluster

mnist

apache-spark

tensorflow

运行这个例子的第一步错误：TensorFlowOnSpark on a Spark Standalone cluster