Apache Poi RowIterator 只返回最后 100 (0 - 99) 行

Apache Poi RowIterator only returning the last 100 (0 - 99) rows

我的代码中,工作簿由一个进程创建,然后由另一个进程读取而不将工作簿写入文件(第二个进程实际上将文件写入 csv 文件)。

阅读本书时,只会阅读最后 100 行。我需要做什么才能让迭代器到达 return 所有行?示例代码如下所示,完整示例在这里:

https://github.com/NACHC-CAD/poi-example-01

代码:

package org.nachc.examples.poi.iteratorexample;

import java.util.Iterator;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.junit.Test;

import lombok.extern.slf4j.Slf4j;

@Slf4j
public class IteratorExampleIntegrationTest {

    public static final int MAX = 220;
    
    @Test
    public void shouldWriteCells() throws Exception {
        SXSSFWorkbook book = new SXSSFWorkbook();
        Sheet sheet = book.createSheet("sheet-001");
        // create the book
        log.info("* * * CREATING * * *");
        for(int r=0;r<MAX;r++) {
            Row row = sheet.createRow(r);
            for(int c=0;c<5;c++) {
                Cell cell = row.createCell(c);
                String str = "ROW " + r + " COL " + c;
                log.info("CREATING: " + str);
                cell.setCellValue(str);
            }
        }
        log.info("* * * ECHO * * *");
        Iterator<Row> rowIterator = sheet.rowIterator();
        while(rowIterator.hasNext()) {
            Row row = rowIterator.next();
            log.info("Got Row: " + row.getRowNum());
        }
        book.close();
        log.info("Done.");
    }

}

输出 (为简洁起见在“...”处截断)

19:55:59.366 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - * * * CREATING * * *
19:55:59.378 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 0 COL 0
19:55:59.379 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 0 COL 1
19:55:59.380 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 0 COL 2
19:55:59.380 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 0 COL 3
19:55:59.380 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 0 COL 4
...
...
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 219 COL 1
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 219 COL 2
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 219 COL 3
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - CREATING: ROW 219 COL 4
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - * * * ECHO * * *
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 120
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 121
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 122
19:55:59.448 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 123
...
...
19:55:59.452 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 217
19:55:59.452 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 218
19:55:59.452 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Got Row: 219
19:55:59.595 [main] INFO org.nachc.examples.poi.iteratorexample.IteratorExampleIntegrationTest - Done.

--- 编辑 ------------------

为清楚起见:这里的解决方案不是从流式行迭代器中取回所有行,而是实现一个可以与流式行迭代器的使用内联的流式处理(参见已接受的注释回答)。

What do I need to do to get the iterator to return all of the rows?

不要使用 SXSSFWorkbook,因为那是 streaming 版本,在将它们刷新到磁盘之前只将最后 100 行(可配置)保留在内存中,因此流媒体部分。

文档如是说:

POI-HSSF and POI-XSSF/SXSSF - Java API To Access Microsoft Excel Format Files

Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

  • Only a limited number of rows are accessible at a point in time.

Javadoc of SXSSFWorkbook

Streaming version of XSSFWorkbook implementing the "BigGridDemo" strategy. This allows to write very large files without running out of memory as only a configurable portion of the rows are kept in memory at any one time.


SXSSF (Streaming Usermodel API)

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

You can specify the window size at workbook construction time via new SXSSFWorkbook(int windowSize) or you can set it per-sheet via SXSSFSheet#setRandomAccessWindowSize(int windowSize)

When a new row is created via createRow() and the total number of unflushed records would exceed the specified window size, then the row with the lowest index value is flushed and cannot be accessed via getRow() anymore.

The default window size is 100 and defined by SXSSFWorkbook.DEFAULT_WINDOW_SIZE.