如何 select 来自张量的前 n 个元素而不重复元素?
How can I select top-n elements from tensor without repeating elements?
我想 select 3 维张量的前 n 个元素,因为所选择的元素都是唯一的。所有元素都按第 2 列排序,我在下面的示例中 selecting top-2 但我不希望其中有重复项。
条件:否for loops
或tf.map_fn()
这是我想要的输入和desired_output:
input_tensor = tf.constant([
[[2.0, 1.0],
[2.0, 1.0],
[3.0, 0.4],
[1.0, 0.1]],
[[44.0, 0.8],
[22.0, 0.7],
[11.0, 0.5],
[11.0, 0.5]],
[[5555.0, 0.8],
[3333.0, 0.7],
[4444.0, 0.4],
[1111.0, 0.1]],
[[444.0, 0.8],
[333.0, 1.1],
[333.0, 1.1],
[111.0, 0.1]]
])
- 这就是我现在得到的;我不想要!
>> TOPK = 2
>> topk_resutls = tf.gather(
input_tensor,
tf.math.top_k(input_tensor[:, :, 1], k=TOPK, sorted=True).indices,
batch_dims=1
)
>> topk_resutls.numpy().tolist()
[[[2.0, 1.0], [2.0, 1.0]],
[[44.0, 0.8], [22.0, 0.7]],
[[5555.0, 0.8], [3333.0, 0.7]],
[[333.0, 1.1], [333.0, 1.1]]]
- 这就是我真正想要的
[[[2.0, 1.0], [3.0, 0.4]], # [3.0, 0.4] is the 2nd highest element based on 2nd column
[[44.0, 0.8], [22.0, 0.7]],
[[5555.0, 0.8], [3333.0, 0.7]],
[[333.0, 1.1], [444.0, 0.8]]] # [444.0, 0.8] is the 2nd highest element based on 2nd column
这是一种可能的方法,但它需要更多的工作,因为它首先对数组进行排序。
import tensorflow as tf
import numpy as np
# Input data
k = 2
input_tensor = tf.constant([
[[2.0, 1.0],
[2.0, 1.0],
[3.0, 0.4],
[1.0, 0.1]],
[[44.0, 0.8],
[22.0, 0.7],
[11.0, 0.5],
[11.0, 0.5]],
[[5555.0, 0.8],
[3333.0, 0.7],
[4444.0, 0.4],
[1111.0, 0.1]],
[[444.0, 0.8],
[333.0, 1.1],
[333.0, 1.1],
[111.0, 0.1]]
])
# Sort by first column
idx = tf.argsort(input_tensor[..., 0], axis=-1)
s = tf.gather_nd(input_tensor, tf.expand_dims(idx, axis=-1), batch_dims=1)
# Find repeated elements
col1 = s[..., 0]
col1_ext = tf.concat([col1[..., :1] - 1, col1], axis=-1)
mask = tf.math.not_equal(col1_ext[..., 1:], col1_ext[..., :-1])
# Replace value for repeated elements with "minus infinity"
col2 = s[..., 1]
col2_masked = tf.where(mask, col2, col2.dtype.min)
# Get top-k results
topk_idx = tf.math.top_k(col2_masked, k=k, sorted=True).indices
topk_results = tf.gather(s, topk_idx, batch_dims=1)
# Print
with np.printoptions(suppress=True):
print(topk_results.numpy())
# [[[ 2. 1. ]
# [ 3. 0.4]]
#
# [[ 44. 0.8]
# [ 22. 0.7]]
#
# [[5555. 0.8]
# [3333. 0.7]]
#
# [[ 333. 1.1]
# [ 444. 0.8]]]
请注意,有一种特殊情况,即一个组中没有 k
个不同的元素。在那种情况下,此解决方案会将重复的元素放在末尾,这会破坏分数顺序。
我想 select 3 维张量的前 n 个元素,因为所选择的元素都是唯一的。所有元素都按第 2 列排序,我在下面的示例中 selecting top-2 但我不希望其中有重复项。
条件:否
for loops
或tf.map_fn()
这是我想要的输入和desired_output:
input_tensor = tf.constant([
[[2.0, 1.0],
[2.0, 1.0],
[3.0, 0.4],
[1.0, 0.1]],
[[44.0, 0.8],
[22.0, 0.7],
[11.0, 0.5],
[11.0, 0.5]],
[[5555.0, 0.8],
[3333.0, 0.7],
[4444.0, 0.4],
[1111.0, 0.1]],
[[444.0, 0.8],
[333.0, 1.1],
[333.0, 1.1],
[111.0, 0.1]]
])
- 这就是我现在得到的;我不想要!
>> TOPK = 2
>> topk_resutls = tf.gather(
input_tensor,
tf.math.top_k(input_tensor[:, :, 1], k=TOPK, sorted=True).indices,
batch_dims=1
)
>> topk_resutls.numpy().tolist()
[[[2.0, 1.0], [2.0, 1.0]],
[[44.0, 0.8], [22.0, 0.7]],
[[5555.0, 0.8], [3333.0, 0.7]],
[[333.0, 1.1], [333.0, 1.1]]]
- 这就是我真正想要的
[[[2.0, 1.0], [3.0, 0.4]], # [3.0, 0.4] is the 2nd highest element based on 2nd column
[[44.0, 0.8], [22.0, 0.7]],
[[5555.0, 0.8], [3333.0, 0.7]],
[[333.0, 1.1], [444.0, 0.8]]] # [444.0, 0.8] is the 2nd highest element based on 2nd column
这是一种可能的方法,但它需要更多的工作,因为它首先对数组进行排序。
import tensorflow as tf
import numpy as np
# Input data
k = 2
input_tensor = tf.constant([
[[2.0, 1.0],
[2.0, 1.0],
[3.0, 0.4],
[1.0, 0.1]],
[[44.0, 0.8],
[22.0, 0.7],
[11.0, 0.5],
[11.0, 0.5]],
[[5555.0, 0.8],
[3333.0, 0.7],
[4444.0, 0.4],
[1111.0, 0.1]],
[[444.0, 0.8],
[333.0, 1.1],
[333.0, 1.1],
[111.0, 0.1]]
])
# Sort by first column
idx = tf.argsort(input_tensor[..., 0], axis=-1)
s = tf.gather_nd(input_tensor, tf.expand_dims(idx, axis=-1), batch_dims=1)
# Find repeated elements
col1 = s[..., 0]
col1_ext = tf.concat([col1[..., :1] - 1, col1], axis=-1)
mask = tf.math.not_equal(col1_ext[..., 1:], col1_ext[..., :-1])
# Replace value for repeated elements with "minus infinity"
col2 = s[..., 1]
col2_masked = tf.where(mask, col2, col2.dtype.min)
# Get top-k results
topk_idx = tf.math.top_k(col2_masked, k=k, sorted=True).indices
topk_results = tf.gather(s, topk_idx, batch_dims=1)
# Print
with np.printoptions(suppress=True):
print(topk_results.numpy())
# [[[ 2. 1. ]
# [ 3. 0.4]]
#
# [[ 44. 0.8]
# [ 22. 0.7]]
#
# [[5555. 0.8]
# [3333. 0.7]]
#
# [[ 333. 1.1]
# [ 444. 0.8]]]
请注意,有一种特殊情况,即一个组中没有 k
个不同的元素。在那种情况下,此解决方案会将重复的元素放在末尾,这会破坏分数顺序。