使用解析器拆分嵌套的 XML 字符串以获取字符串

Split a nested XML string to get a string using parser

我有这个字符串:

'<Section xml:space="preserve" HasTrailingParagraphBreakOnPaste="False" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"><Paragraph FontSize="11" FontFamily="Portable User Interface" Foreground="#FF000000" FontWeight="Normal" FontStyle="Normal" FontStretch="Normal" CharacterSpacing="0" Typography.AnnotationAlternates="0" Typography.EastAsianExpertForms="False" Typography.EastAsianLanguage="Normal" Typography.EastAsianWidths="Normal" Typography.StandardLigatures="True" Typography.ContextualLigatures="True" Typography.DiscretionaryLigatures="False" Typography.HistoricalLigatures="False" Typography.StandardSwashes="0" Typography.ContextualSwashes="0" Typography.ContextualAlternates="True" Typography.StylisticAlternates="0" Typography.StylisticSet1="False" Typography.StylisticSet2="False" Typography.StylisticSet3="False" Typography.StylisticSet4="False" Typography.StylisticSet5="False" Typography.StylisticSet6="False" Typography.StylisticSet7="False" Typography.StylisticSet8="False" Typography.StylisticSet9="False" Typography.StylisticSet10="False" Typography.StylisticSet11="False" Typography.StylisticSet12="False" Typography.StylisticSet13="False" Typography.StylisticSet14="False" Typography.StylisticSet15="False" Typography.StylisticSet16="False" Typography.StylisticSet17="False" Typography.StylisticSet18="False" Typography.StylisticSet19="False" Typography.StylisticSet20="False" Typography.Capitals="Normal" Typography.CapitalSpacing="False" Typography.Kerning="True" Typography.CaseSensitiveForms="False" Typography.HistoricalForms="False" Typography.Fraction="Normal" Typography.NumeralStyle="Normal" Typography.NumeralAlignment="Normal" Typography.SlashedZero="False" Typography.MathematicalGreek="False" Typography.Variants="Normal" TextOptions.TextHintingMode="Fixed" TextOptions.TextFormattingMode="Ideal" TextOptions.TextRenderingMode="Auto" TextAlignment="Left" LineHeight="0" LineStackingStrategy="MaxHeight"><Run>Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.</Run></Paragraph></Section>'

我的目标是提取 Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1. 所以 <run></run>

之间的文本

我用正则表达式做到了,但它不适用于某些 xml 字符串,所以我尝试使用 xml.etree.ElementTree 但我没有成功访问嵌套在 [=16= 中的字符串]

如何使用 XML 解析器提取此文本?

这里有一个简单的获取数据的方法:

xmlstr = '<Section xml:space="preserve" HasTrailingParagraphBreakOnPaste="False" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"><Paragraph FontSize="11" FontFamily="Portable User Interface" Foreground="#FF000000" FontWeight="Normal" FontStyle="Normal" FontStretch="Normal" CharacterSpacing="0" Typography.AnnotationAlternates="0" Typography.EastAsianExpertForms="False" Typography.EastAsianLanguage="Normal" Typography.EastAsianWidths="Normal" Typography.StandardLigatures="True" Typography.ContextualLigatures="True" Typography.DiscretionaryLigatures="False" Typography.HistoricalLigatures="False" Typography.StandardSwashes="0" Typography.ContextualSwashes="0" Typography.ContextualAlternates="True" Typography.StylisticAlternates="0" Typography.StylisticSet1="False" Typography.StylisticSet2="False" Typography.StylisticSet3="False" Typography.StylisticSet4="False" Typography.StylisticSet5="False" Typography.StylisticSet6="False" Typography.StylisticSet7="False" Typography.StylisticSet8="False" Typography.StylisticSet9="False" Typography.StylisticSet10="False" Typography.StylisticSet11="False" Typography.StylisticSet12="False" Typography.StylisticSet13="False" Typography.StylisticSet14="False" Typography.StylisticSet15="False" Typography.StylisticSet16="False" Typography.StylisticSet17="False" Typography.StylisticSet18="False" Typography.StylisticSet19="False" Typography.StylisticSet20="False" Typography.Capitals="Normal" Typography.CapitalSpacing="False" Typography.Kerning="True" Typography.CaseSensitiveForms="False" Typography.HistoricalForms="False" Typography.Fraction="Normal" Typography.NumeralStyle="Normal" Typography.NumeralAlignment="Normal" Typography.SlashedZero="False" Typography.MathematicalGreek="False" Typography.Variants="Normal" TextOptions.TextHintingMode="Fixed" TextOptions.TextFormattingMode="Ideal" TextOptions.TextRenderingMode="Auto" TextAlignment="Left" LineHeight="0" LineStackingStrategy="MaxHeight"><Run>Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.</Run></Paragraph></Section>'
from xml.etree import cElementTree as ET
results = []
root = ET.fromstring(xmlstr)
for p in list(root):
 for r in list(p):
  print(r.text)
  results.append(r.text)

结果:

防火发泡聚苯乙烯 (EPS) 面板通过确保热阻 R = 3.55 K.m².W-1.

对一平方米的表面积执行隔热功能

如果您 运行 在 python 交互式提示中输入代码,最后您可以使用结果:

>>> results
['Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.']
>>> results[0]
'Panneau de polystyrène expansé (PSE) ignifugé réalisant une fonction d’isolation thermique pour un\nm² de surface en assurant la résistance thermique de R = 3.55 K.m².W-1.'