连接线段和标签线段中最近的点

Question

我使用 Open CV 和 skimage 对数据表进行文档分析。我试图单独分割出阴影区域。

我目前能够将零件和编号分割成不同的集群。

使用 skimage 中的 felzenszwalb() 我分割了部分：

import matplotlib.pyplot as plt
import numpy as np     
from skimage.segmentation import felzenszwalb
from skimage.io import imread

img = imread('test.jpg')

segments_fz = felzenszwalb(img, scale=100, sigma=0.2, min_size=50)

print("Felzenszwalb number of segments {}".format(len(np.unique(segments_fz))))

plt.imshow(segments_fz)
plt.tight_layout()
plt.show()

但是连接不上。任何有条不紊地连接并用零件和零件号标出相应段的想法都会有很大帮助。提前感谢您抽出时间——如果我遗漏了什么，过分强调或低估了某个特定点，请在评论中告诉我。

Answer 1

预赛

一些初步代码：

%matplotlib inline
%load_ext Cython
import numpy as np
import cv2
from matplotlib import pyplot as plt
import skimage as sk
import skimage.morphology as skm
import itertools

def ShowImage(title,img,ctype):
  plt.figure(figsize=(20, 20))
  if ctype=='bgr':
    b,g,r = cv2.split(img)       # get b,g,r
    rgb_img = cv2.merge([r,g,b])     # switch it to rgb
    plt.imshow(rgb_img)
  elif ctype=='hsv':
    rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
    plt.imshow(rgb)
  elif ctype=='gray':
    plt.imshow(img,cmap='gray')
  elif ctype=='rgb':
    plt.imshow(img)
  else:
    raise Exception("Unknown colour type")
  plt.axis('off')
  plt.title(title)
  plt.show()

作为参考，这是您的原始图片：

#Read in image
img         = cv2.imread('part.jpg')
ShowImage('Original',img,'bgr')

识别号码

为了简化事情，我们希望将像素分类为打开或关闭。我们可以通过阈值来做到这一点。由于我们的图像包含两个清晰的类像素（黑色和白色），我们可以使用 Otsu's method。我们将反转配色方案，因为我们使用的库认为黑色像素很无聊，而白色像素很有趣。

#Convert image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

#Apply Otsu's method to eliminate pixels of intermediate colour
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

ShowImage('Applying Otsu',thresh,'gray')

#Verify that pixels are either black or white and nothing in between
np.unique(thresh)

我们的策略是找到数字，然后沿着它们附近的线找到零件，然后给这些零件贴上标签。由于方便起见，所有阿拉伯数字都是由连续像素组成的，因此我们可以从找到连通分量开始。

ret, components = cv2.connectedComponents(thresh)
#Each component is a different colour
ShowImage('Connected Components', components, 'rgb')

然后我们可以过滤连接的组件以通过过滤维度来查找数字。请注意，这不是执行此操作的超级可靠方法。更好的选择是使用字符识别，但这留给 reader :-)

作为练习

class Box:
    def __init__(self,x0,x1,y0,y1):
        self.x0, self.x1, self.y0, self.y1 = x0,x1,y0,y1
    def overlaps(self,box2,tol):
        if self.x0 is None or box2.x0 is None:
            return False
        return not (self.x1+tol<=box2.x0 or self.x0-tol>=box2.x1 or self.y1+tol<=box2.y0 or self.y0-tol>=box2.y1)
    def merge(self,box2):
        self.x0 = min(self.x0,box2.x0)
        self.x1 = max(self.x1,box2.x1)
        self.y0 = min(self.y0,box2.y0)
        self.y1 = max(self.y1,box2.y1)
        box2.x0 = None #Used to mark `box2` as being no longer valid. It can be removed later
    def dist(self,x,y):
        #Get center point
        ax = (self.x0+self.x1)/2
        ay = (self.y0+self.y1)/2
        #Get distance to center point
        return np.sqrt((ax-x)**2+(ay-y)**2)
    def good(self):
        return not (self.x0 is None)

def ExtractComponent(original_image, component_matrix, component_number):
    """Extracts a component from a ConnectedComponents matrix"""
    #Create a true-false matrix indicating if a pixel is part of a particular component
    is_component = component_matrix==component_number
    #Find the coordinates of those pixels
    coords = np.argwhere(is_component)

    # Bounding box of non-black pixels.
    y0, x0 = coords.min(axis=0)
    y1, x1 = coords.max(axis=0) + 1   # slices are exclusive at the top

    # Get the contents of the bounding box.
    return x0,x1,y0,y1,original_image[y0:y1, x0:x1]

numbers_img = thresh.copy() #This is used purely to show that we can identify numbers
numbers = []
for component in range(components.max()):
    tx0,tx1,ty0,ty1,this_component = ExtractComponent(thresh, components, component)
    #ShowImage('Component #{0}'.format(component), this_component, 'gray')
    cheight, cwidth = this_component.shape
    #print(cwidth,cheight) #Enable this to see dimensions
    #Identify numbers based on aspect ratio
    if (abs(cwidth-14)<3 or abs(cwidth-7)<3) and abs(cheight-24)<3:
        numbers_img[ty0:ty1,tx0:tx1] = 128
        numbers.append(Box(tx0,tx1,ty0,ty1))
ShowImage('Numbers', numbers_img, 'gray')

我们现在通过稍微扩展边界框并寻找重叠将数字连接成连续的块。

#This is kind of a silly way to do this, but it will work find for small quantities (hundreds)
merged=True                                       #If true, then a merge happened this round
while merged:                                     #Continue until there are no more mergers
    merged=False                                  #Reset merge indicator
    for a,b in itertools.combinations(numbers,2): #Consider all pairs of numbers
        if a.overlaps(b,10):                      #If this pair overlaps
            a.merge(b)                            #Merge it
            merged=True                           #Make a note that we've merged
numbers = [x for x in numbers if x.good()]        #Eliminate those boxes that were gobbled by the mergers

#This is used purely to show that we can identify numbers
numbers_img = thresh.copy() 
for n in numbers:
    numbers_img[n.y0:n.y1,n.x0:n.x1] = 128
    thresh[n.y0:n.y1,n.x0:n.x1] = 0 #Drop numbers from thresholded image
ShowImage('Numbers', numbers_img, 'gray')

好的，现在我们已经确定了数字！我们稍后将使用这些来识别部件。

识别箭头

接下来，我们要弄清楚数字指向的部分。为此，我们要检测线条。霍夫变换对此很有帮助。为了减少误报的数量，我们将数据骨架化，将其转换为最多一个像素宽的表示。

skel = sk.img_as_ubyte(skm.skeletonize(thresh>0))
ShowImage('Skeleton', skel, 'gray')

现在我们进行霍夫变换。我们正在寻找一种能够识别从数字到零件的所有线条的方法。要做到这一点可能需要对参数进行一些调整。

lines = cv2.HoughLinesP(
    skel,
    1,           #Resolution of r in pixels
    np.pi / 180, #Resolution of theta in radians
    30,          #Minimum number of intersections to detect a line
    None,
    80,          #Min line length
    10           #Max line gap
)
lines = [x[0] for x in lines]

line_img = thresh.copy()
line_img = cv2.cvtColor(line_img, cv2.COLOR_GRAY2BGR)
for l in lines:
    color = tuple(map(int, np.random.randint(low=0, high=255, size=3)))
    cv2.line(line_img, (l[0], l[1]), (l[2], l[3]), color, 3, cv2.LINE_AA)
ShowImage('Lines', line_img, 'bgr')

我们现在要找到最接近每个数字的一行或多行并仅保留这些。我们基本上过滤掉了所有不是箭头的线。为此，我们将每条线的端点与每个数字框的中心点进行比较。

  comp_labels = np.zeros(img.shape[0:2], dtype=np.uint8)

for n_idx,n in enumerate(numbers):
    distvals = []
    for i,l in enumerate(lines):
        #Distances from each point of line to midpoint of rectangle
        dists    = [n.dist(l[0],l[1]),n.dist(l[2],l[3])] 
        #Minimum distance and the end point (0 or 1) of the line associated with that point
        #Tuples of (Line Number, Line Point, Dist to Line Point) are produced
        distvals.append( (i,np.argmin(dists),np.min(dists)) )
    #Sort by distance between the number box and the line
    distvals = sorted(distvals, key=lambda x: x[2])
    #Include nearby lines, not just the closest one. This accounts for forking.
    distvals = [x for x in distvals if x[2]<1.5*distvals[0][2]]

    #Draw a white rectangle where the number box was
    cv2.rectangle(comp_labels, (n.x0,n.y0), (n.x1,n.y1), 1, cv2.FILLED)

    #Draw white lines where the arrows are
    for dv in distvals:
        l = lines[dv[0]]
        lp = (l[0],l[1]) if dv[1]==0 else (l[2],l[3])
        cv2.line(comp_labels, (l[0], l[1]), (l[2], l[3]), 1, 3, cv2.LINE_AA)
        cv2.line(comp_labels, (lp[0], lp[1]), ((n.x0+n.x1)//2, (n.y0+n.y1)//2), 1, 3, cv2.LINE_AA)
ShowImage('Lines', comp_labels, 'gray')

寻找零件

这部分很难！我们现在要分割图像中的部分。如果有某种方法可以断开连接子部分的线路，这将很容易。不幸的是，连接子部分的线与构成部分的许多线的宽度相同。

要解决这个问题，我们可以使用很多逻辑。这将是痛苦且容易出错的。

或者，我们可以假设您有一位专家在环。这位专家的唯一工作就是切断连接子部分的线路。这对他们来说应该既简单又快速。给所有东西贴标签对人类来说会很慢而且很难过，但对计算机来说很快。分离事物对人类来说很容易，但对计算机来说却很难。所以我们让他们都做他们最擅长的事情。

在这种情况下，您可能可以在几分钟内培训某人完成这项工作，因此真正的 "expert" 并不是真正必要的。只是一个有点能力的人。

如果你追求这个，你需要在循环工具中编写专家。为此，请保存骨架图像，让您的专家修改它们，然后读回骨架化图像。像这样。

#Save the image, or display it on a GUI
#cv2.imwrite("/z/skel.png", skel);
#EXPERT DOES THEIR THING HERE
#Read the expert-mediated image back in
skelhuman = cv2.imread('/z/skel.png')
#Convert back to the form we need
skelhuman = cv2.cvtColor(skelhuman,cv2.COLOR_BGR2GRAY)
ret, skelhuman = cv2.threshold(skelhuman,0,255,cv2.THRESH_OTSU)
ShowImage('SkelHuman', skelhuman, 'gray')

现在我们已经将各个部分分开了，我们将尽可能多地消除箭头。我们已经在上面提取了这些，因此如果需要，我们可以稍后将它们添加回来。

为了消除箭头，我们将找到所有终止于另一条线以外的位置的线。也就是说，我们将定位只有一个相邻像素的像素。然后我们将消除像素并查看它的邻居。迭代地这样做可以消除箭头。由于我不知道它的另一个术语，我将其称为 Fuse Transform。由于这将需要操纵单个像素，这在 Python 中会 super 慢，我们将在 Cython 中编写转换。

%%cython -a --cplus
import cython

from libcpp.queue cimport queue
import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.cdivision(True) 
cpdef void FuseTransform(unsigned char [:, :] image):
    # set the variable extension types
    cdef int c, x, y, nx, ny, width, height, neighbours
    cdef queue[int] q

    # grab the image dimensions
    height = image.shape[0]
    width  = image.shape[1]

    cdef int dx[8]
    cdef int dy[8]

    #Offsets to neighbouring cells
    dx[:] = [-1,-1,0,1,1,1,0,-1]
    dy[:] = [0,-1,-1,-1,0,1,1,1]

    #Find seed cells: those with only one neighbour
    for y in range(1, height-1):
        for x in range(1, width-1):
            if image[y,x]==0: #Seed cells cannot be blank cells
                continue
            neighbours = 0
            for n in range(0,8):   #Looks at all neighbours
                nx = x+dx[n]
                ny = y+dy[n]
                if image[ny,nx]>0: #This neighbour has a value
                    neighbours += 1
            if neighbours==1:      #Was there only one neighbour?
                q.push(y*width+x)  #If so, this is a seed cell

    #Starting with the seed cells, gobble up the lines
    while not q.empty():
        c = q.front()
        q.pop()
        y = c//width         #Convert flat index into 2D x-y index
        x = c%width
        image[y,x] = 0       #Gobble up this part of the fuse
        neighbour  = -1      #No neighbours yet
        for n in range(0,8): #Look at all neighbours
            nx = x+dx[n]     #Find coordinates of neighbour cells
            ny = y+dy[n]
            #If the neighbour would be off the side of the matrix, ignore it
            if nx<0 or ny<0 or nx==width or ny==height:
                continue
            if image[ny,nx]>0:      #Is the neighbouring cell active?
                if neighbour!=-1:   #If we've already found an active neighbour
                    neighbour=-1    #Then pretend we found no neighbours
                    break           #And stop looking. This is the end of the fuse.
                else:               #Otherwise, make a note of the neighbour's index.
                    neighbour = ny*width+nx
        if neighbour!=-1:           #If there was only one neighbour
            q.push(neighbour)       #Continue burning the fuse

回到标准 Python:

#Apply the Fuse Transform
skh_dilated=skelhuman.copy()
FuseTransform(skh_dilated)
ShowImage('Fuse Transform', skh_dilated, 'gray')

既然我们已经消除了连接各部分的所有箭头和线条，我们将剩余的像素扩大很多。

kernel = np.ones((3,3),np.uint8)
dilated  = cv2.dilate(skh_dilated, kernel, iterations=6)
ShowImage('Dilation', dilated, 'gray')

把它们放在一起

并覆盖我们之前分割出来的标签和箭头...

comp_labels_dilated  = cv2.dilate(comp_labels, kernel, iterations=5)
labels_combined = np.uint8(np.logical_or(comp_labels_dilated,dilated))
ShowImage('Comp Labels', labels_combined, 'gray')

最后，我们采用合并后的数字框、组件箭头和部件，并使用 Color Brewer 中的漂亮颜色为它们中的每一个着色。然后我们将其叠加在原始图像上以获得所需的突出显示。

ret, labels = cv2.connectedComponents(labels_combined)
colormask = np.zeros(img.shape, dtype=np.uint8)
#Colors from Color Brewer
colors = [(228,26,28),(55,126,184),(77,175,74),(152,78,163),(255,127,0),(255,255,51),(166,86,40),(247,129,191),(153,153,153)]
for l in range(labels.max()):
    if l==0: #Background component
        colormask[labels==0] = (255,255,255)
    else:
        colormask[labels==l] = colors[l]
ShowImage('Comp Labels', colormask, 'bgr')
blended = cv2.addWeighted(img,0.7,colormask,0.3,0)
ShowImage('Blended', blended, 'bgr')

最终图像

所以，回顾一下，我们确定了数字、箭头和零件。在某些情况下，我们能够自动将它们分开。在其他情况下，我们在循环中使用专家。在我们必须单独操作像素的地方，我们使用 Cython 来提高速度。

当然，这种事情的危险在于其他图像会打破我在这里所做的（许多）假设。但是，当您尝试使用单个图像来呈现问题时，这就是您要冒的风险。

连接线段和标签线段中最近的点

Connect the nearest points in segment and label segment

python

algorithm

opencv

image-processing

image-segmentation

预赛

识别号码

识别箭头

寻找零件

把它们放在一起

最终图像