如何将整数列表转换并保存为位图图像？

Question

我正在尝试将我认为代表共同构成位图图像的字节的数字列表转换为所述图像文件（保存到磁盘）and/or 简单地转换为 tesseract 可用的形式。不过，我更希望能够将图像可视化，以确保转换确实正常进行。我不知道图像的形状，但我认为它可能是 4 宽 x 8 高。

我有这个 json 字体字符映射文件（image-based 字体，用于日语词典），其中每个字符都表示为位图图像，例如一个字符是：

                       { "bitmap": [0,0,26,0,17,252,17,36,89,100,81,84,81,132,209,252,144,0,19,254,42,84,46,84,38,84,66,86,79,255,0,0], "code": 46370 }

我正在尝试了解这些代表的实际字符。我试图解决这个问题的方法是将这些整数列表转换为字节（或它们代表的字节数组），然后将它们转换为位图图像文件（并可能将它们保存到磁盘 - 这就是我的步骤stuck at)，然后我将对这些图像进行 OCR（使用 tesseract 在 python 中，或者如果我可以将它们放入 pdf 中，可能使用 Adobe 的 OCR）以确定它们的 UTF-8 或 Shift-JIS 等价物。如果我过于复杂，那么我也会感谢更多的指导！

我参考了以下 Whosebug 帖子（以及其他一些帖子）来尝试将整数列表转换为实际的图像文件： How do I convert byte array to bitmap image in Python? Converting int to bytes in Python 3 PIL: Convert Bytearray to Image Convert Numpy array of ASCII codes to string

我也试过 this library 我想我成功地将列表转换成一个代表位的字符串，并转换成这个库版本的位图，但我不知道如何保存结果object。查看源代码，这个特定库的位图 class 似乎对我想做的事情没用。

上面的数字应该对应这张图片：（不是灰度）。

我写了一些东西可以将整数列表转换为字节或“bytearray”（我刚刚尝试了很多不同的东西，但我不确定我真正需要哪种格式），但是然后当我尝试将这些字节保存为 bmp 文件时，我卡住了。根据我的尝试，我会收到如下错误：

OSError: cannot identify image file 'out.bmp'
OSError: cannot identify image file <_io.BytesIO object at 0x000001F037F7C5C8>
AttributeError: 'BitMap' object has no attribute 'save'

或者，我只是保存了一个无法打开的文件，因为它是一种不受支持的文件格式（例如，如果我只是打开一个文件并写入）。

我猜部分问题是我没有保存使用位图的数据 headers。而且，将一些字节保存为图像似乎比我想象的要复杂得多，坦率地说，我什至不知道从哪里开始。

我也不确定我正在制作的字节数组是单个字节的数组还是整个列表的某种表示...

谁能帮我把这个数字列表保存为图片？我不知道我是否真的需要将它保存为位图。

这是我的程序（的一个版本）：

import io
from PIL import Image

test_image = "out.bmp"
test_bytes = [0,0,26,0,17,252,17,36,89,100,81,84,81,132,209,252,144,0,19,254,42,84,46,84,38,84,66,86,79,255,0,0]
actual_bytes = bytes(test_bytes)

def generate_output_image(input_image):
    image = Image.open(io.BytesIO(input_image))
    image.save(test_image)

generate_output_image(actual_bytes)

Answer 1

因为我不知道我在寻找什么，所以我只能说，如果您假设数据是 8x4 或 4x8 图像，这就是您得到的结果：

import numpy as np
from PIL import Image

# Make Numpy array from list
na = np.array(
    [
        0, 176, 16, 164, 27, 254, 10, 164,
        75, 252, 58, 164, 27, 252, 16, 4,
        23, 254, 19, 252, 98, 12, 35, 252,
        17, 16, 16, 160, 55, 254, 0, 0
    ], 
    dtype=np.uint8)

# Make PIL Image from numpy array
im = Image.fromarray(na.reshape(4,8))
# or im = Image.fromarray(na.reshape(8,4))

# Save
im.save('result.png')

Answer 2

您似乎正在转换 https://github.com/FooSoft/zero-epwing 的 json 输出。

我一直无法弄清楚如何转换宽字形，但我使用了这个（绝对破解的）Python 脚本来导出窄字形。

将 font.json 更改为 json 文件的路径。它导出带有字形代码的 bmp 作为文件名。

from PIL import Image
import json

with open('font.json') as json_file:
    data = json.load(json_file)
    for font in data['fonts']:
        for glyph in font['narrow']['glyphs']:
            bitmap = glyph['bitmap']
            row = 0
            img = Image.new('1', (24, 24))
            pixels = img.load()
            for high,low in zip(bitmap[::2], bitmap[1::2]):
                bits = list(map(int, list('{:08b}'.format(high) + '{:08b}'.format(low))))
                col = 0
                for bit in bits:
                    pixels[col, row] = not bit
                    col += 1
                row += 1
            img.save(f'{glyph["code"]}.bmp')

这是导出的字形的样子。
希望这足以让您入门。

如何将整数列表转换并保存为位图图像？

How to convert and save a list of ints to a bitmap image?

python

arrays

ocr

bitmap

python-imaging-library