如何使用 DocumentFormat.OpenXml.Paragraph 从 .DOCX 中提取图像?
How to Extract Image from .DOCX using DocumentFormat.OpenXml.Paragraph?
我需要从 DOCX 文件中提取文本和图像到文本文件中(当然要将图像保存为图形文件)。
使用下面的代码如何获取图像并将其与文本文件中的引用一起保存?
如果我使用:
List<ImagePart> imgPart = wordProcessingDoc.MainDocumentPart.ImageParts.ToList();
我可以获取所有图像,但有时一张图像在多个地方使用。我找不到从列表中获取该特定图像的参考。
这是取自 (Extract table from DOCX) 的示例代码:
public static string ReadAllTextFromDocx(FileInfo fileInfo)
{
StringBuilder stringBuilder;
using (WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(fileInfo.FullName, false))
{
NameTable nameTable = new NameTable();
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(nameTable);
xmlNamespaceManager.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
string wordprocessingDocumentText;
using (StreamReader streamReader = new StreamReader(wordprocessingDocument.MainDocumentPart.GetStream()))
{
wordprocessingDocumentText = streamReader.ReadToEnd();
}
stringBuilder = new StringBuilder(wordprocessingDocumentText.Length);
XmlDocument xmlDocument = new XmlDocument(nameTable);
xmlDocument.LoadXml(wordprocessingDocumentText);
XmlNodeList paragraphNodes = xmlDocument.SelectNodes("//w:p", xmlNamespaceManager);
foreach (XmlNode paragraphNode in paragraphNodes)
{
XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t | .//w:tab | .//w:br | .//w:drawing", xmlNamespaceManager);
foreach (XmlNode textNode in textNodes)
{
switch (textNode.Name)
{
case "w:t":
stringBuilder.Append(textNode.InnerText);
break;
case "w:tab":
stringBuilder.Append("\t");
break;
case "w:br":
stringBuilder.Append("\v");
break;
case "w:drawing":
stringBuilder.Append("----------------IMAGE HERE-------------");
break;
}
}
stringBuilder.Append(Environment.NewLine);
}
}
return stringBuilder.ToString();
}
我在这个 post 中找到了答案:
Replace image in word doc using OpenXML
var imageParts =from graphic in par.Descendants<DocumentFormat.OpenXml.Drawing.Graphic>()
let graphicData = graphic.Descendants<DocumentFormat.OpenXml.Drawing.GraphicData>().FirstOrDefault()
let pic = graphicData.ElementAt(0)
let nvPicPrt = pic.ElementAt(0).FirstOrDefault()
let blip = pic.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault()
select new
{
Id = blip.GetAttribute("embed", xr).Value,
Filename = nvPicPrt.GetAttribute("name", xpic).Value
};
您可以使用从 ImagePart 获取图像流。
var images = from graphic in paragraph
.Descendants<DocumentFormat.OpenXml.Drawing.Graphic>()
let graphicData = graphic.Descendants<DocumentFormat.OpenXml.Drawing.GraphicData>().FirstOrDefault()
let pic = graphicData.ElementAt(0)
let nvPicPrt = pic.ElementAt(0).FirstOrDefault()
let blip = pic.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault()
join part in WordprocessingDocument.MainDocumentPart.Parts on blip.Embed.Value equals part
.RelationshipId
let image = part.OpenXmlPart as ImagePart
select new
{
Id = blip.Embed,
fileStream = image.GetStream()
} ;
我需要从 DOCX 文件中提取文本和图像到文本文件中(当然要将图像保存为图形文件)。 使用下面的代码如何获取图像并将其与文本文件中的引用一起保存?
如果我使用:
List<ImagePart> imgPart = wordProcessingDoc.MainDocumentPart.ImageParts.ToList();
我可以获取所有图像,但有时一张图像在多个地方使用。我找不到从列表中获取该特定图像的参考。 这是取自 (Extract table from DOCX) 的示例代码:
public static string ReadAllTextFromDocx(FileInfo fileInfo)
{
StringBuilder stringBuilder;
using (WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(fileInfo.FullName, false))
{
NameTable nameTable = new NameTable();
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(nameTable);
xmlNamespaceManager.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
string wordprocessingDocumentText;
using (StreamReader streamReader = new StreamReader(wordprocessingDocument.MainDocumentPart.GetStream()))
{
wordprocessingDocumentText = streamReader.ReadToEnd();
}
stringBuilder = new StringBuilder(wordprocessingDocumentText.Length);
XmlDocument xmlDocument = new XmlDocument(nameTable);
xmlDocument.LoadXml(wordprocessingDocumentText);
XmlNodeList paragraphNodes = xmlDocument.SelectNodes("//w:p", xmlNamespaceManager);
foreach (XmlNode paragraphNode in paragraphNodes)
{
XmlNodeList textNodes = paragraphNode.SelectNodes(".//w:t | .//w:tab | .//w:br | .//w:drawing", xmlNamespaceManager);
foreach (XmlNode textNode in textNodes)
{
switch (textNode.Name)
{
case "w:t":
stringBuilder.Append(textNode.InnerText);
break;
case "w:tab":
stringBuilder.Append("\t");
break;
case "w:br":
stringBuilder.Append("\v");
break;
case "w:drawing":
stringBuilder.Append("----------------IMAGE HERE-------------");
break;
}
}
stringBuilder.Append(Environment.NewLine);
}
}
return stringBuilder.ToString();
}
我在这个 post 中找到了答案: Replace image in word doc using OpenXML
var imageParts =from graphic in par.Descendants<DocumentFormat.OpenXml.Drawing.Graphic>()
let graphicData = graphic.Descendants<DocumentFormat.OpenXml.Drawing.GraphicData>().FirstOrDefault()
let pic = graphicData.ElementAt(0)
let nvPicPrt = pic.ElementAt(0).FirstOrDefault()
let blip = pic.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault()
select new
{
Id = blip.GetAttribute("embed", xr).Value,
Filename = nvPicPrt.GetAttribute("name", xpic).Value
};
您可以使用从 ImagePart 获取图像流。
var images = from graphic in paragraph
.Descendants<DocumentFormat.OpenXml.Drawing.Graphic>()
let graphicData = graphic.Descendants<DocumentFormat.OpenXml.Drawing.GraphicData>().FirstOrDefault()
let pic = graphicData.ElementAt(0)
let nvPicPrt = pic.ElementAt(0).FirstOrDefault()
let blip = pic.Descendants<DocumentFormat.OpenXml.Drawing.Blip>().FirstOrDefault()
join part in WordprocessingDocument.MainDocumentPart.Parts on blip.Embed.Value equals part
.RelationshipId
let image = part.OpenXmlPart as ImagePart
select new
{
Id = blip.Embed,
fileStream = image.GetStream()
} ;