IndexError: The shape of the mask [...] at index 0 does not match the shape of the indexed tensor [...] at index 0
IndexError: The shape of the mask [...] at index 0 does not match the shape of the indexed tensor [...] at index 0
我正在尝试使用 Torch 进行标签传播。
我有一个看起来像
的数据框
ID Target Weight Label
1 12 0.4 1
2 24 0.1 0
4 13 0.5 1
4 12 0.3 1
12 1 0.1 1
12 4 0.4 1
13 4 0.2 1
17 1 0.1 0
等等。
我搭建的网络如下:
G = nx.from_pandas_edgelist(df, source='ID', target='Target', edge_attr=['Weight'])
和邻接矩阵
adj_matrix = nx.adjacency_matrix(G).toarray()
我只有两个标签,0 和 1,还有一些数据没有标签。我按如下方式创建了输入张量:
# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(df['Labels'].tolist())
尝试运行以下代码
# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
label_propagation.fit(labels_t) # this is causing the error
我收到错误:IndexError: The shape of the mask [196] at index 0 does not match the shape of the indexed tensor [207] at index 0
。
我检查了 adj_matrix_t.shape
的大小,目前是 (207,207),而标签是 196。
你知道我该如何解决这种不一致吗?
请查看下面的错误轨迹:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-42-cf4f88a4bb12> in <module>
2 label_propagation = LabelPropagation(adj_matrix_t)
3 print("Label Propagation: ", end="")
----> 4 label_propagation.fit(labels_t)
5 label_propagation_output_labels = label_propagation.predict_classes()
6
<ipython-input-1-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
100
101 def fit(self, labels, max_iter=1000, tol=1e-3):
--> 102 super().fit(labels, max_iter, tol)
103
104 ## Label spreading
<ipython-input-1-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
58 Convergence tolerance: threshold to consider the system at steady state.
59 """
---> 60 self._one_hot_encode(labels)
61
62 self.predictions = self.one_hot_labels.clone()
<ipython-input-1-54a7dbc30bd1> in _one_hot_encode(self, labels)
43 self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
44 self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
---> 45 self.one_hot_labels[unlabeled_mask, 0] = 0
46
47 self.labeled_mask = ~unlabeled_mask
下面的代码是我想用于标签传播的示例。看来错误是由于标签引起的。我的数据集中有些节点没有标签(尽管在上面的示例中我为所有标签编写了)。这可能是导致错误消息的原因吗?
原代码(供参考:https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb):
## Testing models on synthetic data
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
# Create caveman graph
n_cliques = 4
size_cliques = 5
caveman_graph = nx.connected_caveman_graph(n_cliques, size_cliques)
adj_matrix = nx.adjacency_matrix(caveman_graph).toarray()
# Create labels
labels = np.full(n_cliques * size_cliques, -1.)
# Only one node per clique is labeled. Each clique belongs to a different class.
labels[0] = 0
labels[size_cliques] = 1
labels[size_cliques * 2] = 2
labels[size_cliques * 3] = 3
# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(labels)
# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
print("Label Propagation: ", end="")
label_propagation.fit(labels_t)
label_propagation_output_labels = label_propagation.predict_classes()
# Learn with Label Spreading
label_spreading = LabelSpreading(adj_matrix_t)
print("Label Spreading: ", end="")
label_spreading.fit(labels_t, alpha=0.8)
label_spreading_output_labels = label_spreading.predict_classes()
# Plot graphs
color_map = {-1: "orange", 0: "blue", 1: "green", 2: "red", 3: "cyan"}
input_labels_colors = [color_map[l] for l in labels]
lprop_labels_colors = [color_map[l] for l in label_propagation_output_labels.numpy()]
lspread_labels_colors = [color_map[l] for l in label_spreading_output_labels.numpy()]
plt.figure(figsize=(14, 6))
ax1 = plt.subplot(1, 4, 1)
ax2 = plt.subplot(1, 4, 2)
ax3 = plt.subplot(1, 4, 3)
ax1.title.set_text("Raw data (4 classes)")
ax2.title.set_text("Label Propagation")
ax3.title.set_text("Label Spreading")
pos = nx.spring_layout(G)
nx.draw(G, ax=ax1, pos=pos, node_color=input_labels_colors, node_size=50)
nx.draw(G, ax=ax2, pos=pos, node_color=lprop_labels_colors, node_size=50)
nx.draw(G, ax=ax3, pos=pos, node_color=lspread_labels_colors, node_size=50)
# Legend
ax4 = plt.subplot(1, 4, 4)
ax4.axis("off")
legend_colors = ["orange", "blue", "green", "red", "cyan"]
legend_labels = ["unlabeled", "class 0", "class 1", "class 2", "class 3"]
dummy_legend = [ax4.plot([], [], ls='-', c=c)[0] for c in legend_colors]
plt.legend(dummy_legend, legend_labels)
plt.show()
当然,如果我在 post 顶部的数据集示例由于标签的原因不适合原始代码,请给我另一个示例以了解标签(哪个确定 类 个节点)在数据集中应该看起来像(即使有缺失值要预测),将不胜感激。
对于这里的其他读者,this 似乎是这个问题中被问到的实现。
您用于尝试预测标签的方法适用于 节点 的标签,不适用于边缘。为了可视化这一点,我绘制了您的示例数据,并通过 Weight
和 Label
列(用于生成下面附加的图的代码)对图进行了着色,其中 Weight
是边缘的线宽, Label
是颜色:
为了使用此方法,您需要生成如下所示的数据,其中每个节点(由 ID
表示)恰好得到一个 node_label
:
ID node_label
1 1
2 0
4 1
12 1
13 1
17 0
需要说明的是,您仍然需要上面的原始数据来构建网络和邻接矩阵,但您必须确定一些逻辑规则以将边标签转换为节点标签。然后一旦你预测了你的未标记节点,你可以在必要时反转你的规则以获得边缘标签。
这不是一种严格的方法,但它很实用,如果您的数据不仅仅是随机噪声,它可能会产生一些合理的结果。
代码附录:
# Sample data network plot
import networkx as nx
import pandas as pd
data = {'ID': {0: 1, 1: 2, 2: 4, 3: 4, 4: 12, 5: 12, 6: 13, 7: 17},
'Target': {0: 12, 1: 24, 2: 13, 3: 12, 4: 1, 5: 4, 6: 4, 7: 1},
'Weight': {0: 0.4, 1: 0.1, 2: 0.5, 3: 0.3, 4: 0.1, 5: 0.4, 6: 0.2, 7: 0.1},
'Label': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 0}}
df = pd.DataFrame.from_dict(data)
G = nx.from_pandas_edgelist(df, source='ID', target='Target', edge_attr=['Weight', 'Label'])
width = [20 * d['Weight'] for (u, v, d) in G.edges(data=True)]
edge_color = [d['Label'] for (u, v, d) in G.edges(data=True)]
nx.draw_networkx(G, width=width, edge_color=edge_color)
我正在尝试使用 Torch 进行标签传播。 我有一个看起来像
的数据框ID Target Weight Label
1 12 0.4 1
2 24 0.1 0
4 13 0.5 1
4 12 0.3 1
12 1 0.1 1
12 4 0.4 1
13 4 0.2 1
17 1 0.1 0
等等。
我搭建的网络如下:
G = nx.from_pandas_edgelist(df, source='ID', target='Target', edge_attr=['Weight'])
和邻接矩阵
adj_matrix = nx.adjacency_matrix(G).toarray()
我只有两个标签,0 和 1,还有一些数据没有标签。我按如下方式创建了输入张量:
# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(df['Labels'].tolist())
尝试运行以下代码
# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
label_propagation.fit(labels_t) # this is causing the error
我收到错误:IndexError: The shape of the mask [196] at index 0 does not match the shape of the indexed tensor [207] at index 0
。
我检查了 adj_matrix_t.shape
的大小,目前是 (207,207),而标签是 196。
你知道我该如何解决这种不一致吗?
请查看下面的错误轨迹:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-42-cf4f88a4bb12> in <module>
2 label_propagation = LabelPropagation(adj_matrix_t)
3 print("Label Propagation: ", end="")
----> 4 label_propagation.fit(labels_t)
5 label_propagation_output_labels = label_propagation.predict_classes()
6
<ipython-input-1-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
100
101 def fit(self, labels, max_iter=1000, tol=1e-3):
--> 102 super().fit(labels, max_iter, tol)
103
104 ## Label spreading
<ipython-input-1-54a7dbc30bd1> in fit(self, labels, max_iter, tol)
58 Convergence tolerance: threshold to consider the system at steady state.
59 """
---> 60 self._one_hot_encode(labels)
61
62 self.predictions = self.one_hot_labels.clone()
<ipython-input-1-54a7dbc30bd1> in _one_hot_encode(self, labels)
43 self.one_hot_labels = torch.zeros((self.n_nodes, self.n_classes), dtype=torch.float)
44 self.one_hot_labels = self.one_hot_labels.scatter(1, labels.unsqueeze(1), 1)
---> 45 self.one_hot_labels[unlabeled_mask, 0] = 0
46
47 self.labeled_mask = ~unlabeled_mask
下面的代码是我想用于标签传播的示例。看来错误是由于标签引起的。我的数据集中有些节点没有标签(尽管在上面的示例中我为所有标签编写了)。这可能是导致错误消息的原因吗?
原代码(供参考:https://mybinder.org/v2/gh/thibaudmartinez/label-propagation/master?filepath=notebook.ipynb):
## Testing models on synthetic data
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
# Create caveman graph
n_cliques = 4
size_cliques = 5
caveman_graph = nx.connected_caveman_graph(n_cliques, size_cliques)
adj_matrix = nx.adjacency_matrix(caveman_graph).toarray()
# Create labels
labels = np.full(n_cliques * size_cliques, -1.)
# Only one node per clique is labeled. Each clique belongs to a different class.
labels[0] = 0
labels[size_cliques] = 1
labels[size_cliques * 2] = 2
labels[size_cliques * 3] = 3
# Create input tensors
adj_matrix_t = torch.FloatTensor(adj_matrix)
labels_t = torch.LongTensor(labels)
# Learn with Label Propagation
label_propagation = LabelPropagation(adj_matrix_t)
print("Label Propagation: ", end="")
label_propagation.fit(labels_t)
label_propagation_output_labels = label_propagation.predict_classes()
# Learn with Label Spreading
label_spreading = LabelSpreading(adj_matrix_t)
print("Label Spreading: ", end="")
label_spreading.fit(labels_t, alpha=0.8)
label_spreading_output_labels = label_spreading.predict_classes()
# Plot graphs
color_map = {-1: "orange", 0: "blue", 1: "green", 2: "red", 3: "cyan"}
input_labels_colors = [color_map[l] for l in labels]
lprop_labels_colors = [color_map[l] for l in label_propagation_output_labels.numpy()]
lspread_labels_colors = [color_map[l] for l in label_spreading_output_labels.numpy()]
plt.figure(figsize=(14, 6))
ax1 = plt.subplot(1, 4, 1)
ax2 = plt.subplot(1, 4, 2)
ax3 = plt.subplot(1, 4, 3)
ax1.title.set_text("Raw data (4 classes)")
ax2.title.set_text("Label Propagation")
ax3.title.set_text("Label Spreading")
pos = nx.spring_layout(G)
nx.draw(G, ax=ax1, pos=pos, node_color=input_labels_colors, node_size=50)
nx.draw(G, ax=ax2, pos=pos, node_color=lprop_labels_colors, node_size=50)
nx.draw(G, ax=ax3, pos=pos, node_color=lspread_labels_colors, node_size=50)
# Legend
ax4 = plt.subplot(1, 4, 4)
ax4.axis("off")
legend_colors = ["orange", "blue", "green", "red", "cyan"]
legend_labels = ["unlabeled", "class 0", "class 1", "class 2", "class 3"]
dummy_legend = [ax4.plot([], [], ls='-', c=c)[0] for c in legend_colors]
plt.legend(dummy_legend, legend_labels)
plt.show()
当然,如果我在 post 顶部的数据集示例由于标签的原因不适合原始代码,请给我另一个示例以了解标签(哪个确定 类 个节点)在数据集中应该看起来像(即使有缺失值要预测),将不胜感激。
对于这里的其他读者,this 似乎是这个问题中被问到的实现。
您用于尝试预测标签的方法适用于 节点 的标签,不适用于边缘。为了可视化这一点,我绘制了您的示例数据,并通过 Weight
和 Label
列(用于生成下面附加的图的代码)对图进行了着色,其中 Weight
是边缘的线宽, Label
是颜色:
为了使用此方法,您需要生成如下所示的数据,其中每个节点(由 ID
表示)恰好得到一个 node_label
:
ID node_label
1 1
2 0
4 1
12 1
13 1
17 0
需要说明的是,您仍然需要上面的原始数据来构建网络和邻接矩阵,但您必须确定一些逻辑规则以将边标签转换为节点标签。然后一旦你预测了你的未标记节点,你可以在必要时反转你的规则以获得边缘标签。
这不是一种严格的方法,但它很实用,如果您的数据不仅仅是随机噪声,它可能会产生一些合理的结果。
代码附录:
# Sample data network plot
import networkx as nx
import pandas as pd
data = {'ID': {0: 1, 1: 2, 2: 4, 3: 4, 4: 12, 5: 12, 6: 13, 7: 17},
'Target': {0: 12, 1: 24, 2: 13, 3: 12, 4: 1, 5: 4, 6: 4, 7: 1},
'Weight': {0: 0.4, 1: 0.1, 2: 0.5, 3: 0.3, 4: 0.1, 5: 0.4, 6: 0.2, 7: 0.1},
'Label': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 0}}
df = pd.DataFrame.from_dict(data)
G = nx.from_pandas_edgelist(df, source='ID', target='Target', edge_attr=['Weight', 'Label'])
width = [20 * d['Weight'] for (u, v, d) in G.edges(data=True)]
edge_color = [d['Label'] for (u, v, d) in G.edges(data=True)]
nx.draw_networkx(G, width=width, edge_color=edge_color)