我可以使用 ImageMagick 获取单个字符的边界框吗?

Can I get the bounding boxes of individual characters with ImageMagick?

我遇到了 paper which uses synthetic handwriting data generated with the ImageMagick convert command, using a lot of different handwriting fonts (Example images from paper)。

他们已经用他们的转录注释了这些图像,但我想用每个单独字符的边界框来注释它们。我想知道 ImageMagick 或任何其他可用的 tool/script/code.

是否可行

我已经解决了这个问题,方法是使用 ImageMagick 一次一个地迭代生成新字符,并使用 OpenCV 屏蔽掉之前的字符以获得新字符的边界框 (Example result)。

示例代码:

import subprocess
import numpy as np
import cv2

full_text = 'OpenCV'
fname = 'test.jpg'
im_size = 'x75'
font = '"ambarella/Ambarella.ttf"'
other_options = '-gravity West -stroke black'

bboxes = []
prev_img = None

# For each letter
for i in range(len(full_text)):
    text = '"' + full_text[:i + 1] + '"'
    fname = 'test_out/' + str(i) + '.jpg'
    command = 'convert -size ' + im_size + ' -font ' + font + ' ' + other_options + ' label:' + text + ' ' + fname
    subprocess.run([command], shell=True)
    img = cv2.imread(fname, 0)
    # Threshold the image
    ret, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
    if prev_img is None:
        inv = 255 - img
        nonzero = cv2.findNonZero(inv)
        x1, y1, w, h = cv2.boundingRect(nonzero)
        x2 = x1 + w
        y2 = y1 + h
        prev_img = img.copy()
        bboxes.append((x1, y1, x2, y2))
    else:

        h, w = img.shape
        d_h, d_w = h - prev_img.shape[0], w - prev_img.shape[1]
        # Pad the older image
        if d_w > 0:
            prev_img = cv2.copyMakeBorder(prev_img, d_h, 0, 0, d_w, cv2.BORDER_CONSTANT, value=255)

        # Mask the previous letters
        nonzero_prev = (prev_img == 0)
        masked_out = img.copy()
        masked_out[nonzero_prev] = 255

        # Get bounding box of new letter
        inv = 255 - masked_out
        nonzero = cv2.findNonZero(inv)
        x1, y1, w, h = cv2.boundingRect(nonzero)
        x2 = x1 + w
        y2 = y1 + h
        bboxes.append((x1, y1, x2, y2))
        # Set prev image to current image
        prev_img = img.copy()

# Visualize results
colors = ((255, 0, 0), (0, 255, 0), (0, 0, 255))
img = cv2.imread(fname)
for i, b in enumerate(bboxes):
    x1, y1, x2, y2 = b
    cv2.rectangle(img, (x1, y1), (x2, y2), colors[i % len(colors)], 1)

cv2.imwrite('boxes.png', img)