使用 Python 将 PDF 转换为图像
Convert PDF to Image using Python
我正在尝试在我安装的 ubuntu 服务器中将 pdf 文件转换为图像文件:
- python2.7
- poppler-utils
- pdf2image==1.12.1
我的代码:
from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path("/home/user/pdf_file.pdf")
# OR
with open("/home/user/pdf_file.pdf") as pdf:
images = convert_from_bytes(pdf.read())
输出
当我使用函数时"convert_from_path"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
当我使用函数时"convert_from_bytes"
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 268, in convert_from_bytes
paths_only=paths_only,
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
我已经重新安装了所有实用程序,然后我遇到了这些问题。
如果您想将 PDF 转换为图像,您可以尝试 Python Ghostscript package:
pip install ghostscript
import ghostscript
import locale
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pef2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
pdf2jpeg(
"...Fixate/ActiveState/pdf/a.pdf",
"...Fixate/ActiveState/pdf/a.jpeg",
)
我在 python2 中也失败了,但在 python3 中成功了。
另一个图书馆也发生了同样的问题:
TypeError: 'threadsafe_iter' object is not an iterator
正如他们所说,这是一个 python 2 对 3 问题,由 next() 函数引起。
如果修改 __next__()
-> next()
in file/home/***/.local/lib/python2.7/site-packages/pdf2image/generators.py
,它会 运行 在 py2.
成功
顺便说一句,我已经为 pdf2image 团队创建了一个新问题。
TypeError: ThreadSafeGenerator object is not an iterator #133
额外
pdf2image 自述文件说这是一个 python (3.5+) 模块。
pdf2image v1.7.1 适用于 py27。通过 pip install pdf2image==1.7.1
尝试
我正在尝试在我安装的 ubuntu 服务器中将 pdf 文件转换为图像文件:
- python2.7
- poppler-utils
- pdf2image==1.12.1
我的代码:
from pdf2image import convert_from_path, convert_from_bytes
images = convert_from_path("/home/user/pdf_file.pdf")
# OR
with open("/home/user/pdf_file.pdf") as pdf:
images = convert_from_bytes(pdf.read())
输出
当我使用函数时"convert_from_path"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
当我使用函数时"convert_from_bytes"
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 268, in convert_from_bytes
paths_only=paths_only,
File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator
我已经重新安装了所有实用程序,然后我遇到了这些问题。
如果您想将 PDF 转换为图像,您可以尝试 Python Ghostscript package:
pip install ghostscript
import ghostscript
import locale
def pdf2jpeg(pdf_input_path, jpeg_output_path):
args = ["pef2jpeg", # actual value doesn't matter
"-dNOPAUSE",
"-sDEVICE=jpeg",
"-r144",
"-sOutputFile=" + jpeg_output_path,
pdf_input_path]
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
pdf2jpeg(
"...Fixate/ActiveState/pdf/a.pdf",
"...Fixate/ActiveState/pdf/a.jpeg",
)
我在 python2 中也失败了,但在 python3 中成功了。
另一个图书馆也发生了同样的问题: TypeError: 'threadsafe_iter' object is not an iterator
正如他们所说,这是一个 python 2 对 3 问题,由 next() 函数引起。
如果修改 __next__()
-> next()
in file/home/***/.local/lib/python2.7/site-packages/pdf2image/generators.py
,它会 运行 在 py2.
顺便说一句,我已经为 pdf2image 团队创建了一个新问题。
TypeError: ThreadSafeGenerator object is not an iterator #133
额外
pdf2image 自述文件说这是一个 python (3.5+) 模块。
pdf2image v1.7.1 适用于 py27。通过 pip install pdf2image==1.7.1