从具有单独文件的文件夹创建自定义数据集
Create a custom dataset from a folder with separate files
我正在使用 Pytorch 的自定义数据集功能从一个文件夹中的单独文件创建自定义数据集。每个文件包含123行123列,所有数据点都是整数。
我的问题是,我遇到的资源可以满足一个 .csv 文件的需求,而我的则不然。更重要的是,在转换为图像后打开图像也不会 运行。我不确定如何从这里开始,因为我的代码给出了:
AttributeError: 'Image' object has no attribute 'read'
import os
from torch.utils.data import DataLoader, Dataset
from numpy import genfromtxt
# Custom dataset
class CONCEPTDataset(Dataset):
""" Concept Dataset """
def __init__(self, file_dir, transforms=None):
"""
Args:
file_dir (string): Directory with all the images.
transforms (optional): Changes on the data.
"""
self.file_dir = file_dir
self.transforms = transforms
self.concepts = os.listdir(file_dir)
self.concepts.sort()
self.concepts = [os.path.join(file_dir, concept) for concept in self.concepts]
def __len__(self):
return len(self.concepts)
def __getitem__(self, idx):
image = self.concepts[idx]
# csv file to a numpy array using genfromtxt
data = genfromtxt(image, delimiter=',')
data = self.transforms(data.unsqueeze(0))
return data
PIL.Image.fromarray
is used to convert an array to a PIL Image while Image.open
is used to load an image file from the file system. You don't need either of those two since you already have a NumPy array representing your image and are looking to return it. PyTorch will convert it to torch.Tensor
automatically if you plug your dataset to a torch.data.utils.DataLoader
.
我正在使用 Pytorch 的自定义数据集功能从一个文件夹中的单独文件创建自定义数据集。每个文件包含123行123列,所有数据点都是整数。
我的问题是,我遇到的资源可以满足一个 .csv 文件的需求,而我的则不然。更重要的是,在转换为图像后打开图像也不会 运行。我不确定如何从这里开始,因为我的代码给出了:
AttributeError: 'Image' object has no attribute 'read'
import os
from torch.utils.data import DataLoader, Dataset
from numpy import genfromtxt
# Custom dataset
class CONCEPTDataset(Dataset):
""" Concept Dataset """
def __init__(self, file_dir, transforms=None):
"""
Args:
file_dir (string): Directory with all the images.
transforms (optional): Changes on the data.
"""
self.file_dir = file_dir
self.transforms = transforms
self.concepts = os.listdir(file_dir)
self.concepts.sort()
self.concepts = [os.path.join(file_dir, concept) for concept in self.concepts]
def __len__(self):
return len(self.concepts)
def __getitem__(self, idx):
image = self.concepts[idx]
# csv file to a numpy array using genfromtxt
data = genfromtxt(image, delimiter=',')
data = self.transforms(data.unsqueeze(0))
return data
PIL.Image.fromarray
is used to convert an array to a PIL Image while Image.open
is used to load an image file from the file system. You don't need either of those two since you already have a NumPy array representing your image and are looking to return it. PyTorch will convert it to torch.Tensor
automatically if you plug your dataset to a torch.data.utils.DataLoader
.