如何使用 iText 查找 PDF 中的所有矩形

How to find all rectangles in a PDF using iText

一个带有文本框(矩形)的 MS word 文档,我已经成功地使用 libreoffice 将其转换为 PDF。 我应该如何在 pdf 中找到所有文本框(矩形)以及如何解释矩形的坐标?

@Override
public void modifyPath(PathConstructionRenderInfo renderInfo) {
    if (renderInfo.getOperation() == PathConstructionRenderInfo.RECT) {
        float x = renderInfo.getSegmentData().get(0);
        float y = renderInfo.getSegmentData().get(1);
        float w = renderInfo.getSegmentData().get(2);
        float h = renderInfo.getSegmentData().get(3);
        Vector a = new Vector(x, y, 1).cross(renderInfo.getCtm());
        Vector c = new Vector(x + w, y + h, 1).cross(renderInfo.getCtm());

实现ExtRenderListener,只允许查找页面(A4)矩形,不查找包含页面中所有内容的(textbox)矩形。

正如 Bruno 所指出的,问题在于您可能会遇到仅由 line-to 或 move-to 操作定义的矩形。

您将需要跟踪所有画线操作,并且 'aggregate' 它们一旦相交(每当绘制的线 end/start 与已知线相匹配时end/start).

public class RectangleFinder implements IEventListener {

    private Map<Line, Integer> knownLines = new HashMap<>();
    private Map<Integer, Integer> clusters = new HashMap<>();

    public void eventOccurred(IEventData data, EventType type) {
        if(data instanceof PathRenderInfo){
            PathRenderInfo pathRenderInfo = (PathRenderInfo) data;
            pathRenderInfo.preserveGraphicsState();
            Path path = pathRenderInfo.getPath();
            if(pathRenderInfo.getOperation() == PathRenderInfo.NO_OP)
                return;
            if(pathRenderInfo.getOperation() != PathRenderInfo.FILL)
                return;
            if(!isBlack(pathRenderInfo.getFillColor()))
                return;
            for(Subpath sPath : path.getSubpaths()){
                for(IShape segment : sPath.getSegments()) {
                    if(segment instanceof Line) {
                        lineOccurred((Line) segment);
                    }
                }
            }
        }
    }

    private boolean isBlack(Color c){
        if(c instanceof IccBased){
            IccBased col01 = (IccBased) c;
            return col01.getNumberOfComponents() == 1 && col01.getColorValue()[0] == 0.0f;
        }
        if(c instanceof DeviceGray){
            DeviceGray col02 = (DeviceGray) c;
            return col02.getNumberOfComponents() == 1 && col02.getColorValue()[0] == 0.0f;
        }
        return false;
    }

    private void lineOccurred(Line line){
        int ID = 0;
        if(!knownLines.containsKey(line)) {
            ID = knownLines.size();
            knownLines.put(line, ID);
        }else{
            ID = knownLines.get(line);
        }

        Point start = line.getBasePoints().get(0);
        Point end = line.getBasePoints().get(1);
        for(Line line2 : knownLines.keySet()){
            if(line.equals(line2))
                continue;
            if(line2.getBasePoints().get(0).equals(start)
                    || line2.getBasePoints().get(1).equals(end)
                    || line2.getBasePoints().get(0).equals(end)
                    || line2.getBasePoints().get(1).equals(start)){
                int ID2 = find(knownLines.get(line2));
                clusters.put(ID, ID2);
                break;
            }
        }
    }

    private int find(int ID){
        int out = ID;
        while(clusters.containsKey(out))
            out = clusters.get(out);
        return out;
    }

    public Set<EventType> getSupportedEvents() {
        return null;
    }

    public Collection<Set<Line>> getClusters(){
        Map<Integer, Set<Line>> out = new HashMap<>();
        for(Integer val : clusters.values())
            out.put(val, new HashSet<Line>());
        out.put(-1, new HashSet<Line>());
        for(Line l : knownLines.keySet()){
            int clusterID = clusters.containsKey(knownLines.get(l)) ? clusters.get(knownLines.get(l)) : -1;
            out.get(clusterID).add(l);
        }
        out.remove(-1);
        return out.values();
    }

    public Collection<Rectangle> getBoundingBoxes(){
        Set<Rectangle> rectangles = new HashSet<>();
        for(Set<Line> cluster : getClusters()){
            double minX = Double.MAX_VALUE;
            double minY = Double.MAX_VALUE;
            double maxX = -Double.MAX_VALUE;
            double maxY = -Double.MAX_VALUE;
            for(Line l : cluster){
                for(Point p : l.getBasePoints()){
                    minX = Math.min(minX, p.x);
                    minY = Math.min(minY, p.y);
                    maxX = Math.max(maxX, p.x);
                    maxY = Math.max(maxY, p.y);
                }
            }
            double w = (maxX - minX);
            double h = (maxY - minY);
            rectangles.add(new Rectangle((float) minX, (float) minY, (float) w, (float) h));
        }
        return rectangles;
    }
}

这是我写的 class 在页面上查找黑色(填充)矩形。 稍作调整,它也可以找到其他矩形。