Excel 到文本转换正确处理公式和空单元格

Excel to text conversion properly handle formula and empty cells

我正在尝试通过 Apache POI 将 excel 文件转换为制表符分隔的文本文件。 excel 的一些单元格使用公式格式化,一些单元格为空。

这是原始 excel 文件的示例:

这是最终输出的摘录:

'US'    'USORACLEAP'    SYSTEMREFERENCE SUPPLIERID  SUPPLIERNAME    CLASSIFICATION  VENDOR_SITE_CODE    SUPPLIERADDRESS1    SUPPLIERADDRESS2    STATE   ZIPCODE COUNTRY SOURCE  INVOICENUM  INVOICEDATE PAYMENTDATE LINE_DESC   GL_COMPANY  GL_CODE GL_DESCR    COSTCENTER  CC_DESCR    CURRENCY_CODE   CHECK_NUMBER    NUM_DOCS    SPEND   TERM    PAYMENT_METHOD  SYSTEM_APPROVED PO_DISTRIBUTION_ID  WALKER_COST_CENTER  RGL_LEDGER_ENTITY   
US  US Oracle AP        RANDBETWEEN(3000,100000)    "TEXT "&D2  VENDOR  "TEXT "&D3  "TEXT "&D3  "TEXT "&D3  ONTARIO RIGHT(D2,5) US  "TEXT "&D3  "TEXT "&D3  RANDBETWEEN(43831, 44150)   RANDBETWEEN(44105,44135)    "TEXT "&D3  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000, 60000)   "TEXT "&D3  "TEXT "&D3  "TEXT "&D3  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   Check           "TEXT"&D2   X2  
US  US Oracle AP        31836   "TEXT "&D3  1099    "TEXT "&D4  "TEXT "&D4  "TEXT "&D4  NY  RIGHT(D3,5) US  "TEXT "&D4  "TEXT "&D4  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D4  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D4  "TEXT "&D4  "TEXT "&D4  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   Check           GSUEDCM03   AF2 
US  US Oracle AP        3504    "TEXT "&D4  VENDOR  "TEXT "&D5  "TEXT "&D5  "TEXT "&D5  NY  RIGHT(D4,5) US  "TEXT "&D5  "TEXT "&D5  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D5  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D5  "TEXT "&D5  "TEXT "&D5  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF3 
US  US Oracle AP        3504    "TEXT "&D5  VENDOR  "TEXT "&D6  "TEXT "&D6  "TEXT "&D6  NY  RIGHT(D5,5) US  "TEXT "&D6  "TEXT "&D6  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D6  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D6  "TEXT "&D6  "TEXT "&D6  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF4 
US  US Oracle AP        3504    "TEXT "&D6  VENDOR  "TEXT "&D7  "TEXT "&D7  "TEXT "&D7  NY  RIGHT(D6,5) US  "TEXT "&D7  "TEXT "&D7  RANDBETWEEN(43831,44150)    RANDBETWEEN(44105,44135)    "TEXT "&D7  RIGHT("000"&RANDBETWEEN(1,999),3)   RANDBETWEEN(55000,60000)    "TEXT "&D7  "TEXT "&D7  "TEXT "&D7  USD RANDBETWEEN(2000000,2100000)    RANDBETWEEN(1,4)    RANDBETWEEN(1,100000)/100   IMMEDIATE   ACH         GSUEIT001   AF5 

如您所见,第一行代表第 headers 列。一些单元格 (D1) 已转换为实际公式。第 3 列没有任何值,因此整个内容在文本文件中向左移动。

代码如下:

private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);

            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    switch (cell.getCellType()) {
                        case STRING:
                            sb.append(cell.getStringCellValue()).append(delimiter);
                            break;
                        case BOOLEAN:
                            sb.append(cell.getBooleanCellValue()).append(delimiter);
                            break;
                        case NUMERIC:
                            sb.append(cell.getNumericCellValue()).append(delimiter);
                            break;
                        case FORMULA:
                            sb.append(cell.getCellFormula()).append(delimiter);
                            break;
                        default:
                            sb.append(EMPTY).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

有人可以指出我应该在我的代码中进行哪些更改来解决对齐和公式问题吗? PS:我使用 TAB (\t) 作为分隔符。

更新: 这是根据建议更新后的代码。

    private void convertXlsToText(InputStream inputStream, String delimiter, File targetFile) throws IOException {
        StringBuilder sb = new StringBuilder();
        setMinInflateRatio(0);
        try (Workbook wb = create(inputStream)) {
            Sheet firstSheet = wb.getSheetAt(0);
            FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();
            DataFormatter formatter = new DataFormatter();
            for (Row nextRow : firstSheet) {
                Iterator<Cell> cellIterator = nextRow.cellIterator();
                while (cellIterator.hasNext()) {
                    Cell cell = cellIterator.next();
                    if (cell != null) {
                        sb.append(format("%-20s", formatter.formatCellValue(cell, evaluator))).append(delimiter);
                    } else {
                        sb.append(format("%-20s", EMPTY)).append(delimiter);
                    }
                }
                sb.append(DEFAULT_LINE_END);
            }
        }

        dumpStringBuilderToFile(sb, targetFile);
    }

要从公式字段而不是公式本身获取值,请检查以下实现:

    FormulaEvaluator evaluator = myWorkbook.getCreationHelper().createFormulaEvaluator();

    CellValue cellValue = evaluator.evaluate(cell); // where **cell** is your formula cell

    switch (cellValue.getCellType()) {
        case Cell.CELL_TYPE_BOOLEAN:
            System.out.println(cellValue.getBooleanValue());
            break;
        case Cell.CELL_TYPE_NUMERIC:
            System.out.println(cellValue.getNumberValue());
            break;
        case Cell.CELL_TYPE_STRING:
            System.out.println(cellValue.getStringValue());
            break;
        case Cell.CELL_TYPE_BLANK:
            break;
        case Cell.CELL_TYPE_ERROR:
            break;
    }               
    }

编辑:

关于对齐问题,检查这个:How can I pad a String in Java?

如果要求将 Excel 数据写入文本文件,则所有单元格值都需要获取为 String。一个方便的方法是使用 DataFormatter of apache poi。使用 DataFormatter 您将获得 Excel 工作表中显示的单元格值。例如。具有数字格式和日期格式。如果您将 DataFormatterFormulaEvaluator 一起使用,则计算公式并将计算值转换为 String.

为了避免丢失空单元格,需要先计算单元格数量,因为单元格迭代器会跳过空单元格。例如,header 行中的单元格数也将是每个后续行的单元格数。

所以整个代码就这么简单:

import org.apache.poi.ss.usermodel.*;
import java.io.*;

class ExcelToText {
 static final String DEFAULT_LINE_END = System.getProperty("line.separator");

 static void convertXlsToText(InputStream inputStream, String delimiter, OutputStream outputStream) throws Exception {
  StringBuilder sb = new StringBuilder();
  Workbook workbook = WorkbookFactory.create(inputStream);
  DataFormatter dataFormatter = new DataFormatter(java.util.Locale.US);
  FormulaEvaluator formulaEvaluator = workbook.getCreationHelper().createFormulaEvaluator();
  String cellValue = "";
  Sheet sheet = workbook.getSheetAt(0);
  Row headerRow = sheet.getRow(0);
  int cellCount = 0;
  if (headerRow != null) {
   cellCount = headerRow.getLastCellNum();
  }
  if (cellCount > 0) {
   for (Row row : sheet) {
    for (int c = 0; c < cellCount; c++) {
     Cell cell = row.getCell(c, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
     cellValue = dataFormatter.formatCellValue(cell, formulaEvaluator);
     sb.append(cellValue);
     if (c < cellCount-1) sb.append(delimiter);
    }
    sb.append(DEFAULT_LINE_END);
   }
  }
  workbook.close();
  BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(outputStream, java.nio.charset.StandardCharsets.UTF_8));
  bw.append(sb);
  bw.flush();
  bw.close();
 }

 public static void main(String[] args) throws Exception {
  convertXlsToText(new FileInputStream("./Excel.xlsx"), "\t", new FileOutputStream("./Data.txt"));
 }
}

不需要 CellType 检查和额外的公式评估。

对于您的其他要求:带分隔符的文本文件应仅包含用分隔符分隔的真实内容。不应该有内容操纵。因此,在我看来,在内容前添加空格或填充特殊宽度的空格并不是一个好主意。例如,如果您将制表符作为分隔符,那么只有在文本查看器中设置的制表符位置才会影响视图。补充加空格只会打扰