在一个索引位置捕获字符串并移动到 xml 文件中的另一个索引位置

Question

我是 Python 初学者，想将现有的 xml 文件转换为 LaTeX 文档。 xml 包含许多有时被拆分的脚注 - 因为它们不适合原始文档（一本旧书）的一页，并且 xml 文件的创建者希望布局如下尽可能接近原始文档。在拆分脚注之间有普通文本以及其他脚注...... 下面的代码应该清楚脚注之间的关系：


> normal text <note place="foot" n="(a)" xml:id="seg2pn_8_1"
> next="#seg2pn_8_2">aaa aaa aaa</note> normal text <note place="foot"
> n="(b)">footnote text</note>. normal text. <note place="foot" n="(a)"
> xml:id="seg2pn_8_2" prev="#seg2pn_8_1">bbb bbb bbb</note>

所需的输出将是：

normal text \footnote{aaa aaa aaa bbb bbb bbb} normal text \footnote{footnote text}. normal text.

所有内容都可以介于注释的两个部分之间：普通文本、其他注释等。使用 regex 的 lookbehind 和 lookahead 以及 pythons zip 方法我能够打印出想要的结果：但是我无法进行实际的替换并将结果写入第二个文件：

#!/usr/bin/env python3
import re
import sys
inFile = sys.argv[1]

with open(inFile,'r') as f:
   fin = f.read()

   strings_first = (re.findall('(?<=seg2pn_\d{1}_2">).*?(?=</note>)', fin, flags=re.DOTALL)) 
   strings_second = (re.findall('(?<=seg2pn_\d{1}_1">).*?(?=</note>)', fin, flags=re.DOTALL))

   for t, y in zip(strings_first, strings_second):
     print(t + y)

Answer 1

如果您对 XSLT 解决方案感兴趣，这很简单。只需使用添加的规则进行恒等式转换：

<xsl:template match="note[@place='foot'][@next]">
  <xsl:copy>
    <xsl:value-of select="."/>
    <xsl:value-of select="id(substring(@next, 2))"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="note[@place='foot'][@prev]"/>

这将脚注清理分为一个单独的处理阶段，这始终是保持此类应用程序逻辑简单的好主意。

我假设脚注永远不会分成两个以上的部分。

在一个索引位置捕获字符串并移动到 xml 文件中的另一个索引位置

capture string in one indexed position and move to another indexed position in xml file

python

regex

xml

latex