如何将pdf数据提取到excel?
How to extract pdf data into excel?
我想将 pdf 数据转换成 excel 数据。我已将 pdf 转换为文本文件,并删除了 .txt 文件中不必要的文本,但它们现在成行,但我希望它们按列排列。
PDF 文件:chemistry-chemists.com/chemister/Spravochniki/handbook-of-aqueous-solubility-data-2010.pdf
excel 文件的当前状态:
excel 文件的所需状态:
PDFtables.com 擅长从 PDF 中提取表格到 Excel。这应该能够满足您的需求:)
在ASP.NET中你可以顺便使用那个代码
<div>
Upload PDF File :<asp:FileUpload ID="fuPdfUpload" runat="server" />
<asp:Button ID="btnExportToExcel" Text="Export To Excel" OnClick="ExportToExcel" runat="server" />
</div>
!!你必须从 NuGet 实现 iTextSharp!!
protected void ExportToExcel(object sender, EventArgs e)
{
if (this.fuPdfUpload.HasFile)
{
string file = Path.GetFullPath(fuPdfUpload.PostedFile.FileName);
this.ExportPDFToExcel(file);
}
}
private void ExportPDFToExcel(string fileName)
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
text.Append(currentText);
}
pdfReader.Close();
Response.Clear();
Response.Buffer = true;
Response.AddHeader("content-disposition", "attachment;filename=ReceiptExport.xls");
Response.Charset = "";
Response.ContentType = "application/vnd.ms-excel";
Response.Write(text);
Response.Flush();
Response.End();
}
看看 Tabula,这是一个非常有效的工具,可以从 pdf 转换 table:https://github.com/tabulapdf/tabula
我想将 pdf 数据转换成 excel 数据。我已将 pdf 转换为文本文件,并删除了 .txt 文件中不必要的文本,但它们现在成行,但我希望它们按列排列。
PDF 文件:chemistry-chemists.com/chemister/Spravochniki/handbook-of-aqueous-solubility-data-2010.pdf
excel 文件的当前状态:
excel 文件的所需状态:
PDFtables.com 擅长从 PDF 中提取表格到 Excel。这应该能够满足您的需求:)
在ASP.NET中你可以顺便使用那个代码
<div>
Upload PDF File :<asp:FileUpload ID="fuPdfUpload" runat="server" />
<asp:Button ID="btnExportToExcel" Text="Export To Excel" OnClick="ExportToExcel" runat="server" />
</div>
!!你必须从 NuGet 实现 iTextSharp!!
protected void ExportToExcel(object sender, EventArgs e)
{
if (this.fuPdfUpload.HasFile)
{
string file = Path.GetFullPath(fuPdfUpload.PostedFile.FileName);
this.ExportPDFToExcel(file);
}
}
private void ExportPDFToExcel(string fileName)
{
StringBuilder text = new StringBuilder();
PdfReader pdfReader = new PdfReader(fileName);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();
string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
text.Append(currentText);
}
pdfReader.Close();
Response.Clear();
Response.Buffer = true;
Response.AddHeader("content-disposition", "attachment;filename=ReceiptExport.xls");
Response.Charset = "";
Response.ContentType = "application/vnd.ms-excel";
Response.Write(text);
Response.Flush();
Response.End();
}
看看 Tabula,这是一个非常有效的工具,可以从 pdf 转换 table:https://github.com/tabulapdf/tabula