Pytorch 多类逻辑回归类型错误
Pytorch Multiclass Logistic Regression Type Errors
我是 ML 的新手,对 Pytorch 更天真。这就是问题所在。 (我跳过了某些部分,例如 random_split() 似乎工作得很好)
我要预测数据集中最后一列的葡萄酒质量(红色)6 类
That's what my dataset looks like
The link to the dataset (winequality-red.csv)
features = df.drop(['quality'], axis = 1)
targets = df.iloc[:, -1] # theres 6 classes
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).float())
# here's where I think the error might be, but I might be wrong
batch_size = 8
# Dataloader
train_loader = DataLoader(train_ds, batch_size, shuffle = True)
val_loader = DataLoader(val_ds, batch_size)
test_ds = DataLoader(test_ds, batch_size)
input_size = len(df.columns) - 1
output_size = 6
threshold = .5
class WineModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, xb):
out = self.linear(xb)
return out
model = WineModel()
n_iters = 2000
num_epochs = n_iters / (len(train_ds) / batch_size)
num_epochs = int(num_epochs)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# the part below returns the error on running
iter = 0
for epoch in range(num_epochs):
for i, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
RuntimeError: 预期标量类型为 Long 但发现为 Float
希望这些信息足够了
nn.CrossEntropyLoss
的目标以 class 索引给出,它们必须是整数,准确地说它们需要是 torch.long
类型,相当于torch.int64
.
您将目标转换为浮点数,但您应该将它们转换为长整型:
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).long())
由于目标是 classes 的索引,因此它们必须在 [0, num_classes - 1] 范围内。因为你有 6 个 classes,它们在 [0, 5] 范围内。快速查看您的数据,质量使用 [3, 8] 范围内的值。即使您有 6 个 classes,这些值也不能直接用作 classes。如果将 class 列为 classes = [3, 4, 5, 6, 7, 8]
,您可以看到第一个 class 是 3,classes[0] == 3
,直到最后一个 class 是 classes[5] == 8
.
您需要用索引替换 class 值,就像您对命名的 classes 所做的那样(例如,如果您有 classes dog 和 cat,dog 将为 0 而 cat 将为 1),但是您可以避免查找它们,因为这些值只是简单地移动了 3,即 index = classes[index] - 3
。因此,您可以从整个目标张量中减去 3:
torch.Tensor(targets).long() - 3
我是 ML 的新手,对 Pytorch 更天真。这就是问题所在。 (我跳过了某些部分,例如 random_split() 似乎工作得很好)
我要预测数据集中最后一列的葡萄酒质量(红色)6 类
That's what my dataset looks like
The link to the dataset (winequality-red.csv)
features = df.drop(['quality'], axis = 1)
targets = df.iloc[:, -1] # theres 6 classes
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).float())
# here's where I think the error might be, but I might be wrong
batch_size = 8
# Dataloader
train_loader = DataLoader(train_ds, batch_size, shuffle = True)
val_loader = DataLoader(val_ds, batch_size)
test_ds = DataLoader(test_ds, batch_size)
input_size = len(df.columns) - 1
output_size = 6
threshold = .5
class WineModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, xb):
out = self.linear(xb)
return out
model = WineModel()
n_iters = 2000
num_epochs = n_iters / (len(train_ds) / batch_size)
num_epochs = int(num_epochs)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-2)
# the part below returns the error on running
iter = 0
for epoch in range(num_epochs):
for i, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
RuntimeError: 预期标量类型为 Long 但发现为 Float
希望这些信息足够了
nn.CrossEntropyLoss
的目标以 class 索引给出,它们必须是整数,准确地说它们需要是 torch.long
类型,相当于torch.int64
.
您将目标转换为浮点数,但您应该将它们转换为长整型:
dataset = TensorDataset(torch.Tensor(np.array(features)).float(), torch.Tensor(targets).long())
由于目标是 classes 的索引,因此它们必须在 [0, num_classes - 1] 范围内。因为你有 6 个 classes,它们在 [0, 5] 范围内。快速查看您的数据,质量使用 [3, 8] 范围内的值。即使您有 6 个 classes,这些值也不能直接用作 classes。如果将 class 列为 classes = [3, 4, 5, 6, 7, 8]
,您可以看到第一个 class 是 3,classes[0] == 3
,直到最后一个 class 是 classes[5] == 8
.
您需要用索引替换 class 值,就像您对命名的 classes 所做的那样(例如,如果您有 classes dog 和 cat,dog 将为 0 而 cat 将为 1),但是您可以避免查找它们,因为这些值只是简单地移动了 3,即 index = classes[index] - 3
。因此,您可以从整个目标张量中减去 3:
torch.Tensor(targets).long() - 3