使用 tfds.load() 无法访问 CelebA 数据集
CelebA Dataset inaccessible using tfds.load()
我正在尝试在深度学习项目中使用 CelebA 数据集。我有来自 Kaggle 的压缩文件夹。
我想解压缩然后将图像拆分为训练、测试和验证,但后来发现这在我的 不那么强大 系统上是不可能的。
所以,为了避免浪费时间,我想使用TensorFlow-datasets方法加载CelebA数据集。但不幸的是,数据集无法访问并出现以下错误:
(代码优先)
ds = tfds.load('celeb_a', split='train', download=True)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-69-d7b9371eb674> in <module>
----> 1 ds = tfds.load('celeb_a', split='train', download=True)
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\load.py in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
344 if download:
345 download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 346 dbuilder.download_and_prepare(**download_and_prepare_kwargs)
347
348 if as_dataset_kwargs is None:
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
383 self.info.read_from_directory(self._data_dir)
384 else:
--> 385 self._download_and_prepare(
386 dl_manager=dl_manager,
387 download_config=download_config)
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, download_config)
1020 def _download_and_prepare(self, dl_manager, download_config):
1021 # Extract max_examples_per_split and forward it to _prepare_split
-> 1022 super(GeneratorBasedBuilder, self)._download_and_prepare(
1023 dl_manager=dl_manager,
1024 max_examples_per_split=download_config.max_examples_per_split,
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, **prepare_split_kwargs)
959 split_generators_kwargs = self._make_split_generators_kwargs(
960 prepare_split_kwargs)
--> 961 for split_generator in self._split_generators(
962 dl_manager, **split_generators_kwargs):
963 if str(split_generator.split_info.name).lower() == "all":
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\image\celeba.py in _split_generators(self, dl_manager)
137 all_images = {
138 os.path.split(k)[-1]: img for k, img in
--> 139 dl_manager.iter_archive(downloaded_dirs["img_align_celeba"])
140 }
141
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\download\download_manager.py in iter_archive(self, resource)
559 if isinstance(resource, six.string_types):
560 resource = resource_lib.Resource(path=resource)
--> 561 return extractor.iter_archive(resource.path, resource.extract_method)
562
563 def extract(self, path_or_paths):
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\download\extractor.py in iter_archive(path, method)
221 An iterator of `(path_in_archive, f_obj)`
222 """
--> 223 return _EXTRACT_METHODS[method](path)
KeyError: <ExtractMethod.NO_EXTRACT: 1>
谁能解释一下我做错了什么?
附带说明一下,如果这不起作用,有没有一种方法可以将已经从 Kaggle 下载的压缩文件转换为所需的格式,而无需解压缩然后单独迭代每个图像?基本上,对于这么大的数据集,我无法走 unzip-then-split 路线...
TIA!
编辑: 我在 Colab 上尝试了同样的操作,但得到了类似的错误:
将 tfds 升级到对我有用的夜间版本
从 GDrive 下载似乎有某种配额限制。转到错误中显示的 google 驱动器 link,并复制到您的驱动器。您也可以通过 gdown
、google_drive_downloader
.
等库下载副本
我正在尝试在深度学习项目中使用 CelebA 数据集。我有来自 Kaggle 的压缩文件夹。 我想解压缩然后将图像拆分为训练、测试和验证,但后来发现这在我的 不那么强大 系统上是不可能的。
所以,为了避免浪费时间,我想使用TensorFlow-datasets方法加载CelebA数据集。但不幸的是,数据集无法访问并出现以下错误:
(代码优先)
ds = tfds.load('celeb_a', split='train', download=True)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-69-d7b9371eb674> in <module>
----> 1 ds = tfds.load('celeb_a', split='train', download=True)
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\load.py in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
344 if download:
345 download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 346 dbuilder.download_and_prepare(**download_and_prepare_kwargs)
347
348 if as_dataset_kwargs is None:
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
383 self.info.read_from_directory(self._data_dir)
384 else:
--> 385 self._download_and_prepare(
386 dl_manager=dl_manager,
387 download_config=download_config)
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, download_config)
1020 def _download_and_prepare(self, dl_manager, download_config):
1021 # Extract max_examples_per_split and forward it to _prepare_split
-> 1022 super(GeneratorBasedBuilder, self)._download_and_prepare(
1023 dl_manager=dl_manager,
1024 max_examples_per_split=download_config.max_examples_per_split,
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, **prepare_split_kwargs)
959 split_generators_kwargs = self._make_split_generators_kwargs(
960 prepare_split_kwargs)
--> 961 for split_generator in self._split_generators(
962 dl_manager, **split_generators_kwargs):
963 if str(split_generator.split_info.name).lower() == "all":
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\image\celeba.py in _split_generators(self, dl_manager)
137 all_images = {
138 os.path.split(k)[-1]: img for k, img in
--> 139 dl_manager.iter_archive(downloaded_dirs["img_align_celeba"])
140 }
141
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\download\download_manager.py in iter_archive(self, resource)
559 if isinstance(resource, six.string_types):
560 resource = resource_lib.Resource(path=resource)
--> 561 return extractor.iter_archive(resource.path, resource.extract_method)
562
563 def extract(self, path_or_paths):
c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\download\extractor.py in iter_archive(path, method)
221 An iterator of `(path_in_archive, f_obj)`
222 """
--> 223 return _EXTRACT_METHODS[method](path)
KeyError: <ExtractMethod.NO_EXTRACT: 1>
谁能解释一下我做错了什么?
附带说明一下,如果这不起作用,有没有一种方法可以将已经从 Kaggle 下载的压缩文件转换为所需的格式,而无需解压缩然后单独迭代每个图像?基本上,对于这么大的数据集,我无法走 unzip-then-split 路线...
TIA!
编辑: 我在 Colab 上尝试了同样的操作,但得到了类似的错误:
将 tfds 升级到对我有用的夜间版本
从 GDrive 下载似乎有某种配额限制。转到错误中显示的 google 驱动器 link,并复制到您的驱动器。您也可以通过 gdown
、google_drive_downloader
.