根据给定条件对组合数据对进行分组

Grouping pairs of combination data based on given condition

假设我有大量数据,其中的样本是:

x= [ 511.31, 512.24, 571.77, 588.35, 657.08, 665.49, -1043.45, -1036.56,-969.39, -955.33]

我使用以下代码生成了所有可能的对

Pairs=[(x[i],x[j]) for i in range(len(x)) for j in range(i+1, len(x))]

这给了我所有可能的配对。现在,如果这些 pairs 在 -25 或 +25 的阈值范围内,我想将它们分组并相应地标记它们。 关于如何做到这一点的任何想法或建议?提前致谢

如果我对您的问题的理解正确,下面的代码应该可以解决问题。这个想法是生成一个字典,其键是平均值,并继续将数据附加到它上面:

import numpy as np #I use numpy for the mean.

#Your threshold
threshold = 25 
#A dictionary will hold the relevant pairs
mylist = {}
for i in Pairs:
    
    #Check for the threshold and discard otherwise
    diff = abs(i[1]-i[0])
    
    if(diff < threshold):
        #Name of the entry in the dictionary
        entry = str('%d'%int(np.mean(i)))
        
        #If the entry already exists, append. Otherwise, create a container list
        if(entry in mylist):
            mylist[entry].append(i)
        else:
            mylist[entry] = [i]

这导致以下输出:

{'-1040': [(-1043.45, -1036.56)],
 '-962': [(-969.39, -955.33)],
 '511': [(511.1, 511.31),
  (511.1, 512.24),
  (511.1, 512.35),
  (511.31, 512.24),
  (511.31, 512.35)],
 '512': [(511.1, 513.35),
  (511.31, 513.35),
  (512.24, 512.35),
  (512.24, 513.35),
  (512.35, 513.35)],
 '580': [(571.77, 588.35)],
 '661': [(657.08, 665.49)]}

这应该是一个快速的方法:

import numpy as np
from scipy.spatial.distance import pdist

# Input data
x = np.array([511.31, 512.24, 571.77, 588.35, 657.08,
              665.49, -1043.45, -1036.56,-969.39, -955.33])
thres = 25.0
# Compute pairwise distances
# default distance metric is'euclidean' which
# would be equivalent but more expensive to compute
d = pdist(x[:, np.newaxis], 'cityblock')
# Find distances within threshold
d_idx = np.where(d <= thres)[0]
# Convert "condensed" distance indices to pair of indices
r = np.arange(len(x))
c = np.zeros_like(r, dtype=np.int32)
np.cumsum(r[:0:-1], out=c[1:])
i = np.searchsorted(c[1:], d_idx, side='right')
j = d_idx - c[i] + r[i] + 1
# Get pairs of values
v_i = x[i]
v_j = x[j]
# Find means
m = np.round((v_i + v_j) / 2).astype(np.int32)
# Print result
for idx in range(len(m)):
    print(f'{m[idx]}: ({v_i[idx]}, {v_j[idx]})')

输出

512: (511.31, 512.24)
580: (571.77, 588.35)
661: (657.08, 665.49)
-1040: (-1043.45, -1036.56)
-962: (-969.39, -955.33)