计算机视觉:如何找到发票中边界框的行数(行)?
Computer Vision: How to find how many rows(lines) of bounding boxes in an invoice?
我有多张发票,我已经在每张发票中找到了边界框的坐标。
这里是y坐标(每个小列表是bounding box的y坐标-ymin和ymax):
[[4, 43],
[9, 47],
[76, 122],
[30, 74],
[10, 47],
[81, 125],
[84, 124],
[47, 90],
[1, 38]]
我想根据 y 坐标确定哪个边界框在第一行,哪个在第二行,哪个在第三行。更一般地说,如何找到第一行、第二行或第三行的范围?
有多张发票行数较多或行数较少
此解决方案对阈值敏感,您可能需要根据每行中的文本量调整阈值!
- 首先,根据文本的存在和数量(黑色像素)对线条进行分割。
- 其次,找到要与边界框进行比较的线条边界。
- 最后,比较边界框索引和分段线索引。
output:
[17, 40, 53, 79, 95, 117]
box [4,43] belongs to line 1
box [9,47] belongs to line 1
box [76,122] belongs to line 3
box [10,47] belongs to line 1
box [81,125] belongs to line 3
box [84,124] belongs to line 3
box [47,90] belongs to line 2
Code:
import cv2
# Read the image
orig = cv2.imread('input.jpg', 0)[:,15:]
# The detected boxes
boxes = [[4, 43],
[9, 47],
[76, 122],
[30, 74],
[10, 47],
[81, 125],
[84, 124],
[47, 90],
[1, 38]]
# make a deep copy
img = orig.copy()
# quantify the black pixels in each line
summ = img.sum(axis=1)
# Threshold
th = summ.mean()
img[summ>th, :] = 0
img[summ<=th,:] = 1
rows = []
for y in range(img.shape[0]-1):
if img[y,0]>img[y+1,0] or img[y,0]<img[y+1,0]:
rows.append(y)
# sort lines indices.
rows.sort()
print(rows)
# compare the indices
for box in boxes:
for idx in range(0, len(rows), 2):
if box[0] < rows[idx] and box[1] > rows[idx+1]:
print("box [{},{}]".format(box[0], box[1]), " belongs to line {}".format(idx//2+1))
我有多张发票,我已经在每张发票中找到了边界框的坐标。
这里是y坐标(每个小列表是bounding box的y坐标-ymin和ymax):
[[4, 43],
[9, 47],
[76, 122],
[30, 74],
[10, 47],
[81, 125],
[84, 124],
[47, 90],
[1, 38]]
我想根据 y 坐标确定哪个边界框在第一行,哪个在第二行,哪个在第三行。更一般地说,如何找到第一行、第二行或第三行的范围?
有多张发票行数较多或行数较少
此解决方案对阈值敏感,您可能需要根据每行中的文本量调整阈值!
- 首先,根据文本的存在和数量(黑色像素)对线条进行分割。
- 其次,找到要与边界框进行比较的线条边界。
- 最后,比较边界框索引和分段线索引。
output:
[17, 40, 53, 79, 95, 117]
box [4,43] belongs to line 1
box [9,47] belongs to line 1
box [76,122] belongs to line 3
box [10,47] belongs to line 1
box [81,125] belongs to line 3
box [84,124] belongs to line 3
box [47,90] belongs to line 2
Code:
import cv2
# Read the image
orig = cv2.imread('input.jpg', 0)[:,15:]
# The detected boxes
boxes = [[4, 43],
[9, 47],
[76, 122],
[30, 74],
[10, 47],
[81, 125],
[84, 124],
[47, 90],
[1, 38]]
# make a deep copy
img = orig.copy()
# quantify the black pixels in each line
summ = img.sum(axis=1)
# Threshold
th = summ.mean()
img[summ>th, :] = 0
img[summ<=th,:] = 1
rows = []
for y in range(img.shape[0]-1):
if img[y,0]>img[y+1,0] or img[y,0]<img[y+1,0]:
rows.append(y)
# sort lines indices.
rows.sort()
print(rows)
# compare the indices
for box in boxes:
for idx in range(0, len(rows), 2):
if box[0] < rows[idx] and box[1] > rows[idx+1]:
print("box [{},{}]".format(box[0], box[1]), " belongs to line {}".format(idx//2+1))