np.where 或 pd.DataFrame 非零索引字典

Question

我正在尝试制作一个超级超级快的 nearest neighbors 东西。现在我正在使用 networkx 然后遍历所有 G.nodes() 然后 S = set(G.neighbors(node)) 然后 S.remove(node) 这工作得很好但我想在索引和利用数据结构方面做得更好.我想尽可能避免迭代。

我最终想得到一个字典对象，其中键是 root_node，值是一组节点邻居（不包括 root_node）

这是我的图和 DF_adj 邻接矩阵的样子：

当我执行 np.where(DF_adj == 1) 时，输出是 2 个数组，如下所示：

(array([ 0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,  1,  2,  2,
        3,  3,  3,  4,  4,  4,  5,  5,  5,  6,  6,  6,  7,  7,  7,  8,  8,
        8,  9,  9, 10, 10]), array([ 0,  1,  3,  4,  5,  7,  0,  1,  2,  3,  4,  6,  8,  9, 10,  1,  2,
        0,  1,  3,  0,  1,  4,  0,  5,  6,  1,  5,  6,  0,  7,  8,  1,  7,
        8,  1,  9,  1, 10]))

检查了这个，但它并没有完全帮助我 Python pandas: select columns with all zero entries in dataframe

def neighbors(DF_adj):
    D_node_neighbors = defaultdict(set)
    DF_indexer = DF_adj.fillna(False).astype(bool) #Don't need this for my matrix but could be useful for non-binary matrices if someones needs it
    for node in DF_adj.columns:
        D_node_neighbors[node] = set(DF_adj.index[np.where(DF_adj[node] == 1)])
        D_node_neighbors[node].remove(node)
    return(D_node_neighbors)

如何在整个 pd.DataFrame 上使用 np.where 来获得这种类型的输出？

defaultdict(set,
            {'a': {'b', 'd', 'e', 'f', 'h'},
             'b': {'a', 'c', 'd', 'e', 'g', 'i', 'j', 'k'},
             'c': {'b'},
             'd': {'a', 'b'},
             'e': {'a', 'b'},
             'f': {'a', 'g'},
             'g': {'b', 'f'},
             'h': {'a', 'i'},
             'i': {'b', 'h'},
             'j': {'b'},
             'k': {'b'}})

Answer 1

你可以用理解字典来做。如果 df 是：

   a  b  c  d  e  f  g  h  i  j  k
a  1  1  0  1  1  1  0  1  0  0  0
b  1  1  1  1  1  0  1  0  1  1  1
c  0  1  1  0  0  0  0  0  0  0  0
d  1  1  0  1  0  0  0  0  0  0  0
e  1  1  0  0  1  0  0  0  0  0  0
f  1  0  0  0  0  1  1  0  0  0  0
g  0  1  0  0  0  1  1  0  0  0  0
h  1  0  0  0  0  0  0  1  1  0  0
i  0  1  0  0  0  0  0  1  1  0  0
j  0  1  0  0  0  0  0  0  0  1  0
k  0  1  0  0  0  0  0  0  0  0  1

那么{i:{ j for j in df.index if df.ix[i,j] and i!= j} for i in df.index }就是：

{'j': {'b'},
 'e': {'a', 'b'},
 'g': {'b', 'f'},
 'k': {'b'},
 'a': {'b', 'd', 'e', 'f', 'h'},
 'c': {'b'},
 'i': {'b', 'h'},
 'f': {'a', 'g'},
 'b': {'a', 'c', 'd', 'e', 'g', 'i', 'j', 'k'},
 'd': {'a', 'b'},
 'h': {'a', 'i'}}

或快 2 倍：

s=df.index        
d=collections.defaultdict(set)
for (k,v) in zip(*where(df==1)): 
    if k!=v:
        d[s[k]].add(s[v])

np.where 或 pd.DataFrame 非零索引字典

np.where on pd.DataFrame for dictionary of non-zero indicies

python

network-programming

numpy

where

pandas