Java - 使用 POI 过滤 Excel 文件的行

Java - Filtering rows of an Excel file using POI

我有一个包含很多行(超过 60,000 行)的 Excel 文件,我想对它们应用过滤器以便只读取我要查找的行。

我在 Java 中使用 POI 库,但我没有找到如何过滤值。

例如,在我的 Excel 文件中有以下数据:

First name | Last name | Age
-----------+-----------+----
Jhon       | Doe       |  25
Foo        | Bar       |  20
Aaa        | Doe       |  22

我如何 select 每行姓氏等于 Doe

到目前为止,这是我的代码:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // Cell range to filter
    CellRangeAddress data = new CellRangeAddress(
            1,
            worksheet.getLastRowNum(),
            0,
            worksheet.getRow(0).getPhysicalNumberOfCells());

    worksheet.setAutoFilter(data);
}

我尝试使用 AutoFilter 但我不知道它是如何工作的。

我正在寻找如下所示的功能:

Filter filter = new Filter();
filter.setRange(myRange);
filter.addFilter(
    0, // The column index
    "Doe" // The value that I'm searching for
)
filter.apply()

这纯粹是假设的代码。

感谢您的帮助!

如果您的问题是如何为姓氏设置 AutoFilter 标准“Doe”,那么这只能使用基础低级别 ooxml-schemas 类 来实现。 XSSFAutoFilter 直到现在都没用。直到现在它还没有提供任何方法。

使用您的示例数据完成示例:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.xssf.usermodel.*;

import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTAutoFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilterColumn;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilters;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.CTCustomFilter;
import org.openxmlformats.schemas.spreadsheetml.x2006.main.STFilterOperator;

import java.io.FileOutputStream;

class AutoFilterSetTest {

 private static void setCellData(Sheet sheet) {

  Object[][] data = new Object[][] {
   new Object[] {"First name", "Last name", "Age"},
   new Object[] {"John", "Doe", 25},
   new Object[] {"Foo", "Bar", 20},
   new Object[] {"Jane", "Doe", 22},
   new Object[] {"Ruth", "Moss", 42},
   new Object[] {"Manuel", "Doe", 32},
   new Object[] {"Axel", "Richter", 56},
  };

  Row row = null;
  Cell cell = null;
  int r = 0;
  int c = 0;
  for (Object[] dataRow : data) {
   row = sheet.createRow(r);
   c = 0;
   for (Object dataValue : dataRow) {
    cell = row.createCell(c);
    if (dataValue instanceof String) {
     cell.setCellValue((String)dataValue);
    } else if (dataValue instanceof Number) {
     cell.setCellValue(((Number)dataValue).doubleValue());
    }
    c++;
   }
   r++;
  }
 }

 private static void setCriteriaFilter(XSSFSheet sheet, int colId, int firstRow, int lastRow, String[] criteria) throws Exception {
  CTAutoFilter ctAutoFilter = sheet.getCTWorksheet().getAutoFilter();
  CTFilterColumn ctFilterColumn = null;
  for (CTFilterColumn filterColumn : ctAutoFilter.getFilterColumnList()) {
   if (filterColumn.getColId() == colId) ctFilterColumn = filterColumn;
  }
  if (ctFilterColumn == null) ctFilterColumn = ctAutoFilter.addNewFilterColumn();
  ctFilterColumn.setColId(colId);
  if (ctFilterColumn.isSetFilters()) ctFilterColumn.unsetFilters();

  CTFilters ctFilters = ctFilterColumn.addNewFilters();
  for (int i = 0; i < criteria.length; i++) {
   ctFilters.addNewFilter().setVal(criteria[i]);
  }

  //hiding the rows not matching the criterias
  DataFormatter dataformatter = new DataFormatter();
  for (int r = firstRow; r <= lastRow; r++) {
   XSSFRow row = sheet.getRow(r);
   boolean hidden = true;
   for (int i = 0; i < criteria.length; i++) {
    String cellValue = dataformatter.formatCellValue(row.getCell(colId));
    if (criteria[i].equals(cellValue)) hidden = false;
   }
   if (hidden) {
    row.getCTRow().setHidden(hidden);
   } else {
    if (row.getCTRow().getHidden()) row.getCTRow().unsetHidden();
   }
  }
 }

 public static void main(String[] args) throws Exception {

  XSSFWorkbook wb = new XSSFWorkbook();
  XSSFSheet sheet = wb.createSheet();

  //create rows of data
  setCellData(sheet);

  for (int c = 0; c < 2; c++) sheet.autoSizeColumn(c);

  int lastRow = sheet.getLastRowNum();
  XSSFAutoFilter autofilter = sheet.setAutoFilter(new CellRangeAddress(0, lastRow, 0, 2));
  //XSSFAutoFilter is useless until now

  //set filter criteria 
  setCriteriaFilter(sheet, 1, 1, lastRow, new String[]{"Doe"});

  //get only visible rows after filtering
  XSSFRow row = null;
  for (int r = 1; r <= lastRow; r++) {
   row = sheet.getRow(r);
   if (row.getCTRow().getHidden()) continue;
   for (int c = 0; c < 3; c++) {
    System.out.print(row.getCell(c) + "\t");
   }
   System.out.println();
  }

  FileOutputStream out = new FileOutputStream("AutoFilterSetTest.xlsx");
  wb.write(out);
  out.close();
  wb.close();
 }
}

它打印:

John    Doe   25.0  
Jane    Doe   22.0  
Manuel  Doe   32.0  

结果 AutoFilterSetTest.xlsx 看起来像:

也许这可以帮助其他人,所以这是我之前想出的解决方案
考虑到我对Java不是很好,所以下面的代码肯定可以优化。

我自己实现了一个过滤器,为此,我创建了 3 类 :

  • ExcelWorksheetFilter
  • FilterRule
  • FilterRuleOperation

ExcelWorksheetFilter

import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.util.CellRangeAddress;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.util.ArrayList;
import java.util.List;

public class ExcelWorksheetFilter {

    private List<FilterRule> ruleList = new ArrayList<>();
    private CellRangeAddress cellRange;
    private XSSFSheet worksheet;
    private XSSFWorkbook workbook;

    public ExcelWorksheetFilter(XSSFWorkbook workbook, int worksheetId) {
        this.workbook = workbook;
        this.worksheet = workbook.getSheetAt(worksheetId);
    }

    /**
     * Apply rules of ruleList to the worksheet.
     * The row is put in the result if at least one rule match.
     */
    public void apply(){

        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(true);
            for(FilterRule rule : ruleList){
                if(rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(false);
                    break;
                }
            }
        }
    }

    /**
     * Apply rules of ruleList to the worksheet.
     * The row is put in the result if every rules match.
     */
    public void applyStrict(){
        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            worksheet.getRow(rowId).getCTRow().setHidden(false);
            for(FilterRule rule : ruleList){
                if(!rule.match(worksheet.getRow(rowId))){
                    worksheet.getRow(rowId).getCTRow().setHidden(true);
                    break;
                }
            }
        }
    }

    public List<Row> getRowList(){
        List<Row> rowList = new ArrayList<>();

        for(int rowId = cellRange.getFirstRow(); rowId <= cellRange.getLastRow(); rowId++){
            if(!worksheet.getRow(rowId).getCTRow().getHidden()){
                rowList.add(worksheet.getRow(rowId));
            }
        }

        return rowList;
    }

    public void addRule(FilterRule rule) {
        this.ruleList.add(rule);
    }

    // Getters and setters omitted...
}

FilterRule

import org.apache.poi.ss.usermodel.DataFormatter;
import org.apache.poi.xssf.usermodel.XSSFRow;

public class FilterRule {

    private final static DataFormatter df = new DataFormatter();

    private Integer columnId;
    private String[] values;
    private FilterRuleOperation operator;

    public FilterRule(Integer columnId, FilterRuleOperation operator, String[] values){
        this.columnId = columnId;
        this.operator = operator;
        this.values = values;
    }

    /**
     * If at least one of the value matches return true.
     * @param row The row to match
     * @return a boolean
     */
    public boolean match(XSSFRow row){
        for(String value : values){
            if(operator.match(df.formatCellValue(row.getCell(columnId)), value)){
                return true;
            };
        }
        return false;
    }
}

FilterRuleOperation

public enum FilterRuleOperation {

    DIFFERENT("!="){
        @Override public boolean match(String x, String y){
            return !x.equals(y);
        }
    },
    EQUAL("=="){
        @Override public boolean match(String x, String y){
            return x.equals(y);
        }
    };

    private final String text;

    private FilterRuleOperation(String text) {
        this.text = text;
    }

    public abstract boolean match(String x, String y);

    @Override public String toString() {
        return text;
    }
}

然后您就可以像 OP 中描述的那样使用它了。
例如这个 Excel 文件:

还有这段代码:

public void parseExcelFile(XSSFWorkbook myExcelFile) {
    XSSFSheet worksheet = myExcelFile.getSheetAt(1);

    // Create the filter
    ExcelWorksheetFilter excelWorksheetFilter = new ExcelWorksheetFilter(myExcelFile, 0);
    excelWorksheetFilter.setCellRange(new CellRangeAddress(
        1, // Exclude the row with columns titles
        worksheet.getLastRowNum(),
        0,
        worksheet.getRow(0).getPhysicalNumberOfCells()-1
    ));

    // Create rules for filtering
    excelWorksheetFilter.addRule(new FilterRule(
            1, // Last name column
            FilterRuleOperation.EQUAL,
            new String[]{"Doe"}
            ));

    excelWorksheetFilter.addRule(new FilterRule(
            0, // First name column
            FilterRuleOperation.EQUAL,
            new String[]{"Jhon"}
    ));

    // Apply with applyStrict function puts a AND condition between rules
    excelWorksheetFilter.applyStrict();
    // You can also use apply function it puts a OR condition between rules
    // excelWorksheetFilter.apply();
    
    excelWorksheetFilter.getRowList().forEach(row -> {
        for(int i = 0; i <3; i++) {
            System.out.print(df.formatCellValue(row.getCell(i)) + '\t');
        }
        System.out.println();
    });

    // Save the file
    FileOutputStream out = new FileOutputStream("filter_test.xlsx");
    excelWorksheetFilter.getWorkbook().write(out);
    out.close();
    excelWorksheetFilter.getWorkbook().close();
}

这将打印:

Jhon    Doe 25

如果你使用 excelWorksheetFilter.apply() 它将打印:

Jhon    Doe    25   
Aaa     Doe    22   
Jhon    Smith  30

两个主要缺点是:

  • 它不使用 Excel 过滤器,因此以后很难使用 Excel 文件。
  • 内存效率不高,因为 ExcelWorksheetFilter.getRowList() 函数 return 是列表而不是迭代器。

它也只适用于字符串,但我想它可以适应其他类型的数据。