我无法使用 pytesseract 阅读长距离文本

Question

我有这张图片，我想阅读上面的文字，但是 pytesseract returns 空白

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import cv2
import numpy as np
import math
from scipy import ndimage
import easyocr
import pytesseract

img = cv2.imread('cikti.jpg')
scale_percent = 220 # percent of original size
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
  
# resize image
img = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
cv2.imshow('img', img)
cv2.waitKey(0)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
cv2.imshow('edges', edges)
cv2.waitKey(0)
angles = []
lines = cv2.HoughLinesP(edges, 1, math.pi / 180.0, 90)
for [[x1, y1, x2, y2]] in lines:
    #cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)
    angle = math.degrees(math.atan2(y2 - y1, x2 - x1))
    if(angle != 0):
        angles.append(angle)
    
    
    print(angles)
median_angle = np.median(angles)
img = ndimage.rotate(img, median_angle)
print(median_angle)
filiter = np.array([[-1,-1,-1],
                    [-1,9,-1],
                    [-1,-1,-1]])

cv2.imshow('filitird', img)
cv2.waitKey(0)
reader = easyocr.Reader(['tr'])
ocr_result = reader.readtext(img,)
print(ocr_result)


cv2.imshow('result', img)
k = cv2.waitKey(0)
cv2.destroyAllWindows()

这是我写的代码

可能是距离远的缘故，放大图片并没有解决我的问题。我该怎么办

Answer 1

通过执行以下操作，我能够使用 tesseract 成功读取此图像：

剪掉粉色边框
还原为灰度（二值化）
运行使用 --psm 8 构建正方体（参见）

我不知道是否需要裁剪，但在二值化之前，我无法使用任何页面隔离模式获得任何输出。

我在这里手动进行了处理，但您可能希望将其自动化。设置阈值的一个好技巧是查看相关图像的标准偏差并使用它来缩放阈值，而不是选择一些绝对值并让它失败。

这是我处理的图像：

和运行:

$ tesseract img3.png img3 --psm 8 txt
Tesseract Open Source OCR Engine v4.1.1 with Leptonica
$ cat img3.txt
47 F02 43

我没有尝试过使用 pytesseract，但你应该可以设置相同的东西。

当我尝试使用 web service

时，Easy ocr 能够立即读取图像，尽管不准确

更新：灰度

这本身就是一个完整的主题。您可能想从 opencv 文档中的 this tutorial 开始。基本上有两种方法——尝试正确地对图像进行二值化（将其转换为两种颜色的像素，打开或关闭）和仅对其进行灰度化。介于两者之间的是 'posterising'，您可以在其中减少音调数量（二值化是色调分离的一种特殊情况，其中音调数量为 2）。我通常使用 PIL (pillow) 中的内置函数处理灰度；我使用 quick-and-dirty sort-of 二值化算法取得了很好的效果，我首先对图像的亮度和对比度进行归一化，然后应用像

这样的倾斜函数

def filter_point(point: int) -> int:
    if point < THRESH:
        return round(point/1.2)
    else:
        return round(point *2)

这会将 大多数 像素驱动到完全 white/black 但会保留一些中间值。这是一个糟糕的解决方案，因为它取决于三个幻数，但在我的应用程序（准备扫描的 pdf 供人类阅读）中，我得到了比自动阈值化或后处理更好的结果。

因此很遗憾，答案将是 'play with it'。我建议你从一个图像编辑器开始，看看你可以对图像做些什么来让 tesseract 工作——也许只是灰度化（你在代码的前面做的）就足够了；你需要裁剪它吗，等等。不画那个粉红色的框会有帮助。我提供了一个非常粗略的示例过滤器来证明像素只是数字，你可以用这种方式进行图像处理，但如果可能的话，你多最好使用内置方法。

Answer 2

import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
import cv2 as cv
import numpy as np
import easyocr

img = cv.imread('result.png',0)
th2 = cv.adaptiveThreshold(img, 255, cv.ADAPTIVE_THRESH_MEAN_C, cv.THRESH_BINARY, 29, 10);
cv.imshow("ADAPTIVE_THRESH_MEAN_C", th2)

cv.waitKey(0)
cv.destroyAllWindows()  

reader = easyocr.Reader(['tr'])
ocr_result = reader.readtext(th2,)
print(ocr_result)

它是这样工作的 ocr 之前的图像：

结果：

我无法使用 pytesseract 阅读长距离文本

I can't read long distance text with pytesseract

python

ocr

更新：灰度