计算机视觉:如何找到发票中边界框的行数(行)?

Computer Vision: How to find how many rows(lines) of bounding boxes in an invoice?

      [[4, 43],
      [9, 47],
      [76, 122],
      [30, 74],
      [10, 47],
      [81, 125],
      [84, 124],
      [47, 90],
      [1, 38]]

我想根据 y 坐标确定哪个边界框在第一行,哪个在第二行,哪个在第三行。更一般地说,如何找到第一行、第二行或第三行的范围?

有多张发票行数较多或行数较少

此解决方案对阈值敏感,您可能需要根据每行中的文本量调整阈值!

  • 首先,根据文本的存在和数量(黑色像素)对线条进行分割。
  • 其次,找到要与边界框进行比较的线条边界。
  • 最后,比较边界框索引和分段线索引。

output:

[17, 40, 53, 79, 95, 117]
box [4,43]  belongs to line 1
box [9,47]  belongs to line 1
box [76,122]  belongs to line 3
box [10,47]  belongs to line 1
box [81,125]  belongs to line 3
box [84,124]  belongs to line 3
box [47,90]  belongs to line 2

Code:

import cv2

# Read the image
orig = cv2.imread('input.jpg', 0)[:,15:]

# The detected boxes
boxes = [[4, 43],
[9, 47],
[76, 122],
[30, 74],
[10, 47],
[81, 125],
[84, 124],
[47, 90],
[1, 38]]

# make a deep copy
img = orig.copy()

# quantify the black pixels in each line
summ = img.sum(axis=1)

# Threshold
th = summ.mean()

img[summ>th, :] = 0
img[summ<=th,:] = 1

rows = []
for y in range(img.shape[0]-1):
    if img[y,0]>img[y+1,0] or img[y,0]<img[y+1,0]:
        rows.append(y)

# sort lines indices. 
rows.sort()

print(rows)

# compare the indices
for box in boxes:
    for idx in range(0, len(rows), 2):
        if box[0] < rows[idx] and box[1] > rows[idx+1]:
            print("box [{},{}]".format(box[0], box[1]), " belongs to line {}".format(idx//2+1))