如何导入手动下载的MNIST数据集?
How can I import the MNIST dataset that has been manually downloaded?
我一直在试验一个Keras的例子,需要导入MNIST数据
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
它会生成错误消息,例如 Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno 110] Connection timed out
应该和我使用的网络环境有关。 有什么函数或代码可以让我直接导入手动下载的MNIST数据集吗?
我尝试了以下方法
import sys
import pickle
import gzip
f = gzip.open('/data/mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
data = pickle.load(f)
else:
data = pickle.load(f, encoding='bytes')
f.close()
import numpy as np
(x_train, _), (x_test, _) = data
然后我收到以下错误信息
Traceback (most recent call last):
File "test.py", line 45, in <module>
(x_train, _), (x_test, _) = data
ValueError: too many values to unpack (expected 2)
嗯,keras.datasets.mnist
文件 is really short。您可以手动模拟相同的动作,即:
- 从 https://s3.amazonaws.com/img-datasets/mnist.pkl.gz
下载数据集
.
import gzip
f = gzip.open('mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
data = cPickle.load(f)
else:
data = cPickle.load(f, encoding='bytes')
f.close()
(x_train, _), (x_test, _) = data
您不需要额外的代码,但可以告诉 load_data
首先加载本地版本:
- 您可以下载文件 https://s3.amazonaws.com/img-datasets/mnist.npz from another computer with proper (proxy) access (taken from https://github.com/keras-team/keras/blob/master/keras/datasets/mnist.py),
- 将其复制到目录
~/.keras/datasets/
(在 Linux 和 macOS 上)
- 和运行
load_data(path='mnist.npz')
具有正确的文件名
Keras 文件位于 Google 云存储中的一个新路径(之前它在 AWS S3 中):
https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
使用时:
tf.keras.datasets.mnist.load_data()
您可以传递一个 path
参数。
load_data()
会调用get_file()
作为参数fname
,如果路径是完整路径且文件存在,则不会下载。
示例:
# gsutil cp gs://tensorflow/tf-keras-datasets/mnist.npz /tmp/data/mnist.npz
# python3
>>> import tensorflow as tf
>>> path = '/tmp/data/mnist.npz'
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> len(train_images)
>>> 60000
- 下载文件
https://s3.amazonaws.com/img-datasets/mnist.npz
- 将
mnist.npz
移动到.keras/datasets/
目录
加载数据
import keras
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
keras.datasets.mnist.load_data()
将尝试从远程存储库获取,即使指定了本地文件路径也是如此。但是,加载下载文件的最简单解决方法是使用 numpy.load()
、just like they do:
path = '/tmp/data/mnist.npz'
import numpy as np
with np.load(path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
Gogasca 的回答稍作调整对我有用。对于 Python 3.9,更改 ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py 中的代码,以便它使用路径变量作为完整路径而不是添加 origin_folder 使得它成为可能将任何本地路径传递给下载的文件。
- 下载文件:https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
- 将其放在 ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/ 或您喜欢的其他位置。
- 修改~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py
path = path
""" origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/' """
""" path = get_file(
path,origin=origin_folder + 'mnist.npz',file_hash='731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1') """
with np.load(path, allow_pickle=True) as f: # pylint:
disable=unexpected-keyword-arg
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)
- 使用以下代码加载数据:
path = "/Users/username/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.npz"
(train_images, train_labels), (test_images, test_labels ) = mnist.load_data(path=path)```
我一直在试验一个Keras的例子,需要导入MNIST数据
from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()
它会生成错误消息,例如 Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno 110] Connection timed out
应该和我使用的网络环境有关。 有什么函数或代码可以让我直接导入手动下载的MNIST数据集吗?
我尝试了以下方法
import sys
import pickle
import gzip
f = gzip.open('/data/mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
data = pickle.load(f)
else:
data = pickle.load(f, encoding='bytes')
f.close()
import numpy as np
(x_train, _), (x_test, _) = data
然后我收到以下错误信息
Traceback (most recent call last):
File "test.py", line 45, in <module>
(x_train, _), (x_test, _) = data
ValueError: too many values to unpack (expected 2)
嗯,keras.datasets.mnist
文件 is really short。您可以手动模拟相同的动作,即:
- 从 https://s3.amazonaws.com/img-datasets/mnist.pkl.gz 下载数据集
.
import gzip f = gzip.open('mnist.pkl.gz', 'rb') if sys.version_info < (3,): data = cPickle.load(f) else: data = cPickle.load(f, encoding='bytes') f.close() (x_train, _), (x_test, _) = data
您不需要额外的代码,但可以告诉 load_data
首先加载本地版本:
- 您可以下载文件 https://s3.amazonaws.com/img-datasets/mnist.npz from another computer with proper (proxy) access (taken from https://github.com/keras-team/keras/blob/master/keras/datasets/mnist.py),
- 将其复制到目录
~/.keras/datasets/
(在 Linux 和 macOS 上) - 和运行
load_data(path='mnist.npz')
具有正确的文件名
Keras 文件位于 Google 云存储中的一个新路径(之前它在 AWS S3 中):
https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
使用时:
tf.keras.datasets.mnist.load_data()
您可以传递一个 path
参数。
load_data()
会调用get_file()
作为参数fname
,如果路径是完整路径且文件存在,则不会下载。
示例:
# gsutil cp gs://tensorflow/tf-keras-datasets/mnist.npz /tmp/data/mnist.npz
# python3
>>> import tensorflow as tf
>>> path = '/tmp/data/mnist.npz'
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> len(train_images)
>>> 60000
- 下载文件
https://s3.amazonaws.com/img-datasets/mnist.npz
- 将
mnist.npz
移动到.keras/datasets/
目录 加载数据
import keras from keras.datasets import mnist (X_train, y_train), (X_test, y_test) = mnist.load_data()
keras.datasets.mnist.load_data()
将尝试从远程存储库获取,即使指定了本地文件路径也是如此。但是,加载下载文件的最简单解决方法是使用 numpy.load()
、just like they do:
path = '/tmp/data/mnist.npz'
import numpy as np
with np.load(path, allow_pickle=True) as f:
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
Gogasca 的回答稍作调整对我有用。对于 Python 3.9,更改 ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py 中的代码,以便它使用路径变量作为完整路径而不是添加 origin_folder 使得它成为可能将任何本地路径传递给下载的文件。
- 下载文件:https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
- 将其放在 ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/ 或您喜欢的其他位置。
- 修改~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py
path = path
""" origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/' """
""" path = get_file(
path,origin=origin_folder + 'mnist.npz',file_hash='731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1') """
with np.load(path, allow_pickle=True) as f: # pylint:
disable=unexpected-keyword-arg
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)
- 使用以下代码加载数据:
path = "/Users/username/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.npz"
(train_images, train_labels), (test_images, test_labels ) = mnist.load_data(path=path)```