使用条件裁剪或阈值张量并在 PyTorch 中对结果进行零填充
Clip or threshold a tensor using condition and zero pad the result in PyTorch
假设我有这样的张量
w = [[0.1, 0.7, 0.7, 0.8, 0.3],
[0.3, 0.2, 0.9, 0.1, 0.5],
[0.1, 0.4, 0.8, 0.3, 0.4]]
现在我想根据某些条件(例如大于或不大于 0.5)消除某些值
w = [[0.1, 0.3],
[0.3, 0.2, 0.1],
[0.1, 0.4, 0.3, 0.4]]
然后将其填充为等长:
w = [[0.1, 0.3, 0, 0],
[0.3, 0.2, 0.1, 0],
[0.1, 0.4, 0.3, 0.4]]
这就是我在 pytorch 中实现它的方式:
w = torch.rand(3, 5)
condition = w <= 0.5
w = [w[i][condition[i]] for i in range(3)]
w = torch.nn.utils.rnn.pad_sequence(w)
但显然这会非常慢,主要是因为列表理解。
有没有更好的方法呢?
这是使用 布尔掩码、tensor splitting, and then eventually padding the splitted tensors using torch.nn.utils.rnn.pad_sequence(...)
.
的一种直接方法
# input tensor to work with
In [213]: w
Out[213]:
tensor([[0.1000, 0.7000, 0.7000, 0.8000, 0.3000],
[0.3000, 0.2000, 0.9000, 0.1000, 0.5000],
[0.1000, 0.4000, 0.8000, 0.3000, 0.4000]])
# values above this should be clipped from the input tensor
In [214]: clip_value = 0.5
# generate a boolean mask that satisfies the condition
In [215]: boolean_mask = (w <= clip_value)
# we need to sum the mask along axis 1 (needed for splitting)
In [216]: summed_mask = boolean_mask.sum(dim=1)
# a sequence of splitted tensors
In [217]: splitted_tensors = torch.split(w[boolean_mask], summed_mask.tolist())
# finally pad them along dimension 1 (or axis 1)
In [219]: torch.nn.utils.rnn.pad_sequence(splitted_tensors, 1)
Out[219]:
tensor([[0.1000, 0.3000, 0.0000, 0.0000],
[0.3000, 0.2000, 0.1000, 0.5000],
[0.1000, 0.4000, 0.3000, 0.4000]])
关于效率的简短说明:使用 torch.split()
非常高效,因为它 returns 将拆分的张量作为 视图 原始张量(即没有复制)。
假设我有这样的张量
w = [[0.1, 0.7, 0.7, 0.8, 0.3],
[0.3, 0.2, 0.9, 0.1, 0.5],
[0.1, 0.4, 0.8, 0.3, 0.4]]
现在我想根据某些条件(例如大于或不大于 0.5)消除某些值
w = [[0.1, 0.3],
[0.3, 0.2, 0.1],
[0.1, 0.4, 0.3, 0.4]]
然后将其填充为等长:
w = [[0.1, 0.3, 0, 0],
[0.3, 0.2, 0.1, 0],
[0.1, 0.4, 0.3, 0.4]]
这就是我在 pytorch 中实现它的方式:
w = torch.rand(3, 5)
condition = w <= 0.5
w = [w[i][condition[i]] for i in range(3)]
w = torch.nn.utils.rnn.pad_sequence(w)
但显然这会非常慢,主要是因为列表理解。 有没有更好的方法呢?
这是使用 布尔掩码、tensor splitting, and then eventually padding the splitted tensors using torch.nn.utils.rnn.pad_sequence(...)
.
# input tensor to work with
In [213]: w
Out[213]:
tensor([[0.1000, 0.7000, 0.7000, 0.8000, 0.3000],
[0.3000, 0.2000, 0.9000, 0.1000, 0.5000],
[0.1000, 0.4000, 0.8000, 0.3000, 0.4000]])
# values above this should be clipped from the input tensor
In [214]: clip_value = 0.5
# generate a boolean mask that satisfies the condition
In [215]: boolean_mask = (w <= clip_value)
# we need to sum the mask along axis 1 (needed for splitting)
In [216]: summed_mask = boolean_mask.sum(dim=1)
# a sequence of splitted tensors
In [217]: splitted_tensors = torch.split(w[boolean_mask], summed_mask.tolist())
# finally pad them along dimension 1 (or axis 1)
In [219]: torch.nn.utils.rnn.pad_sequence(splitted_tensors, 1)
Out[219]:
tensor([[0.1000, 0.3000, 0.0000, 0.0000],
[0.3000, 0.2000, 0.1000, 0.5000],
[0.1000, 0.4000, 0.3000, 0.4000]])
关于效率的简短说明:使用 torch.split()
非常高效,因为它 returns 将拆分的张量作为 视图 原始张量(即没有复制)。