从 doc(x) 文件 .NET 中提取嵌入的 table

Extract embedded table from doc(x) file .NET

据我了解 - 最短的方法是将文件转换为 XML。这样就可以通过标签找到 table。

var fileinfo = new FileInfo(@"c:\Users\a1oleg\Desktop\myFile.docx");                        

XDocument xml = null;
using (StreamReader oReader = new StreamReader(fileinfo.FullName)
{                
    xml = XDocument.Load(oReader);
}

错误是:

System.Xml.XmlException: 'Data at the root level is invalid. Line 1, position 1.'

你可以用Microsoft.Office.Interop.Word

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Documents docs = app.Documents;
Document doc = docs.Open("C:\users\Test.docx", ReadOnly:true);
Table tbl = doc.Tables[1];
Range rg = tbl.Range;
Cells cells = rg.Cells;