从具有单独文件的文件夹创建自定义数据集

Create a custom dataset from a folder with separate files

我正在使用 Pytorch 的自定义数据集功能从一个文件夹中的单独文件创建自定义数据集。每个文件包含123行123列,所有数据点都是整数。

我的问题是,我遇到的资源可以满足一个 .csv 文件的需求,而我的则不然。更重要的是,在转换为图像后打开图像也不会 运行。我不确定如何从这里开始,因为我的代码给出了:

AttributeError: 'Image' object has no attribute 'read'
import os
from torch.utils.data import DataLoader, Dataset
from numpy import genfromtxt

# Custom dataset
class CONCEPTDataset(Dataset):
    """ Concept Dataset """

    def __init__(self, file_dir, transforms=None):
        """
        Args:
            file_dir (string): Directory with all the images.
            transforms (optional): Changes on the data.
        """
        self.file_dir = file_dir
        self.transforms = transforms

        self.concepts = os.listdir(file_dir)
        self.concepts.sort()
        self.concepts = [os.path.join(file_dir, concept) for concept in self.concepts]
    
    def __len__(self):
        return len(self.concepts)

    def __getitem__(self, idx):
        image = self.concepts[idx]

        # csv file to a numpy array using genfromtxt
        data = genfromtxt(image, delimiter=',')

        data = self.transforms(data.unsqueeze(0))
        return data

PIL.Image.fromarray is used to convert an array to a PIL Image while Image.open is used to load an image file from the file system. You don't need either of those two since you already have a NumPy array representing your image and are looking to return it. PyTorch will convert it to torch.Tensor automatically if you plug your dataset to a torch.data.utils.DataLoader.