XML 到具有无限子元素的 CSV vis XSLT 开发不佳

Question

Whosebug 的新手，向 CSV 询问有关 XML 的问题。我是一名具有 SPSS 背景的数据管理员，因此 XML 并不总是我的强项。出于多种原因，我正在尝试将从分层数据库导出并存储在 XML 中的数据集转换为 CSV 格式。原始数据库的结构不是很好，导致我的 XSLT 出现问题。

这是我必须使用的XML。这是一个 700mb 的文件：

  <ABC_Data>
    <UID>1</UID>
    <DocumentNumber>000000001</DocumentNumber>
    <Surname>Smith</Surname>
    <GivenName>John</GivenName>
    <BirthDateList>
        <BirthDate>19/06/19888</BirthDate>
    </BirthDateList>
    <StationNumberList>
        <StationNumber>2009981</StationNumber>
    </StationNumberList>
    <Reference>
        <ReferenceEn>RG 150, Volume 01 - 1</ReferenceEn>
        <ReferenceFr>RG 150, Volume 01 - 1</ReferenceFr>
    </Reference>
    <DigitizeList>
        <Image>http://data.foo.bar.com/733a.gif</Image>
        <Image>http://data2.for.bar.com/733b.gif</Image>
    </DigitizeList>
    <UID>2</UID>
    <DocumentNumber>000000002</DocumentNumber>
    <Surname>Kootz</Surname>
    <GivenName>Ernst</GivenName>
    <BirthDateList>
        <BirthDate>24/12/1984</BirthDate>
    </BirthDateList>
    <StationNumberList>
        <StationNumber>2000023</StationNumber>
    </StationNumberList>
    <Reference>
        <ReferenceEn>RG 150, Volume 01 - 1</ReferenceEn>
        <ReferenceFr>RG 150, Volume 01 - 1</ReferenceFr>
    </Reference>
    <DigitizeList>
        <Image>http://data.foo.bar.com/744a.gif</Image>
        <Image>http://data2.for.bar.com/755b.gif</Image>

    </DigitizeList>
    </ABC_Data>

这是我正在使用 (from this thread) 将其转换为 CSV 格式的 basic XSLT。发生的事情是记录没有正确嵌套，所以我无法获得将文件中的一条记录与另一条记录区分开来的输出。此外，多个 <Image> 字段在输出中被收集在一起，没有中间分隔符，即，它们将 1 个字段变成 2 个或 3 个或 4 个不同的字段，因为 <Image> 的数量可能在文中[编辑：现已解决].

这是 XSLT：

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="text" encoding="iso-8859-1"/>

    <xsl:strip-space elements="*" />

    <xsl:template match="/*/child::*">
    <xsl:for-each select="child::*">
    <xsl:if test="position() != last()"><xsl:value-of select="normalize-space(.)"/>;</xsl:if>
    <xsl:if test="position() = last()"><xsl:value-of select="normalize-space(.)"/>;</xsl:if>
    </xsl:for-each>
    </xsl:template>

    </xsl:stylesheet>

这是我想要得到的输出模型。它满足了区分记录以及区分具有相似名称的多个 "Image" 字段的需要：

1;0000000001;Smith;John;19/06/19888;2009981;RG 150, Volume 01 - 1;RG 150, Volume 01 - 1;>http://data.foo.bar.com/733a.gif;http://data2.for.bar.com/733b.gif
2;0000000002;Koontz;Ernst;24/12/1984;2000023;RG 150, Volume 01 - 1;RG 150, Volume 01 - 1;http://data.foo.bar.com/744a.gif;http://data2.for.bar.com/755b.gif

任何人都可以提出前进的方向吗？我想清理一下，以便

单独图像字段中的所有内容在输出时在它们之间有一个分号。 [编辑：已解决，谢谢hivemind！]
我可以区分记录1和记录2、记录3等

我的 XSLT 知识已有将近 10 年的历史，因此我可以借助社区的支持来获得这方面的帮助。

谢谢。

Answer 1

试试这个

<xsl:template match="/">
    <xsl:for-each select="descendant::*[not(child::*)]">
        <xsl:value-of select="normalize-space(.)"/><xsl:text>;</xsl:text>
    </xsl:for-each>
</xsl:template>

Answer 2

AFAICT，以下样式表将产生与预期输出几乎相同的结果：

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*" />

<xsl:key name="cells" match="ABC_Data/*[not(self::UID)]" use="generate-id(preceding-sibling::UID[1])" />

<xsl:template match="/ABC_Data">
    <xsl:for-each select="UID">
        <xsl:apply-templates select=". | key('cells', generate-id())"/>
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

<xsl:template match="*[not(*)]">
    <xsl:value-of select="." />
    <xsl:text>;</xsl:text>
</xsl:template>

</xsl:stylesheet>

唯一的区别是每行保留一个尾随 ; 字符。这是因为我们不知道哪个元素是其行中的最后一个单元格——也不知道它是否包含多个子元素。

如果您确实知道这一点，则可以添加一个与名称匹配的模板。否则你必须先将每一行放入一个变量，然后输出没有最后一个字符的变量：

<xsl:template match="/ABC_Data">
    <xsl:for-each select="UID">
        <xsl:variable name="row">
            <xsl:apply-templates select=". | key('cells', generate-id())"/>
        </xsl:variable>
        <xsl:value-of select="substring($row, 1, string-length($row) - 1)" />
        <xsl:text>&#10;</xsl:text>
    </xsl:for-each>
</xsl:template>

顺便说一句，我怀疑这个结果的用处。作为 CSV 文件的接收者，我希望每一列都有来自同一域的数据（事实上，我希望每一列都有一个标签）。至少在理论上，您的输入可能包含具有不同数量的 BirthDates、StationNumbers、References 等的记录 - 导致在未对齐的列中包含不同数量的单元格的行。

XML 到具有无限子元素的 CSV vis XSLT 开发不佳

poorly developed XML to CSV vis XSLT with unbounded child elements

xml

xslt

export-to-csv