C# saxonapi.Evaluate 在 500MB XML 上使用 1300 万行 运行 Xquery 花费的时间太长

C# saxonapi.Evaluate taking too long to run Xquery on 500MB XML with 13Million lines

应用程序在 4CPU 16GB RAM 上编译为 64 位 运行。 SaxonApi.Evaluate 对 500MB xml 文件和 1300 万行的 3 次评估调用占用了 47 分钟的总时间(60 分钟)。每个 Evaluate 运行一个 XQuery,其中 returns 80,000 个项目,每个项目有 20 个节点。

我们需要做些什么来改进 SaxonApi.Evaluate 方法

一些您可能会觉得有用的提示:

  • 测量查询性能如何随源文档大小而变化。它是线性的还是二次的?如果它是二次的,那可能是因为你正在进行某种连接。如果它是一个简单的连接,那么 Saxon-EE 优化器可能会提供实质性的提升 - 下载评估并试一试。
  • 关于性能,细节决定成败。为了解释您获得的性能,我们需要知道您正在做的事情的每一个细节,以至于我们可以自己重现结果。告诉我们您有一个需要很长时间的查询,甚至没有显示查询,这是在浪费每个人的时间。

粘贴示例 XML 和我正在使用的 Xquery。 LargeXML中有80K/Top/level1/Sch3K1.

XML

<?xml version="1.0" encoding="UTF-8"?>
<Top>
    <level1>        
        <Sch3K1>
            <PartnershipInformation>
                <PartnershipName>Partner1</PartnershipName>
                <PartnershipFEIN>XXXXXXX</PartnershipFEIN>
                <PartnerAddress>
                    <USAddress>
                        <AddressLine1Txt>xxxx</AddressLine1Txt>
                        <CityNm>City</CityNm>
                        <StateAbbreviationCd>MO</StateAbbreviationCd>
                        <ZIPCd>1111</ZIPCd>
                    </USAddress>
                </PartnerAddress>
            </PartnershipInformation>
            <PartnerInformation>
                <Individual>
                    <PartnerName>
                        <FirstName>Partner1 FName</FirstName>
                        <MiddleInitial>P</MiddleInitial>
                        <LastName>Partner1 LName</LastName>
                    </PartnerName>
                    <PartnerSSN>XXXXXX</PartnerSSN>
                </Individual>
                <PartnerAddress>
                    <USAddress>
                        <AddressLine1Txt>318 Some STREET</AddressLine1Txt>
                        <CityNm>City2</CityNm>
                        <StateAbbreviationCd>WY</StateAbbreviationCd>
                        <ZIPCd>2222</ZIPCd>
                    </USAddress>
                </PartnerAddress>
                <LimitedPartner>X</LimitedPartner>
                <DomesticPartner>X</DomesticPartner>
                <PartnersProfitBOY>0.00003779</PartnersProfitBOY>
                <PartnersProfitEOY>0.0000319</PartnersProfitEOY>
                <PartnersLossBOY>0.00003779</PartnersLossBOY>
                <PartnersLossEOY>0.0000319</PartnersLossEOY>
                <PartnersCapitalBOY>0.00003779</PartnersCapitalBOY>
                <PartnersCapitalEOY>0.0000319</PartnersCapitalEOY>
                <PartnersLiabilitiesNonrecourse>0</PartnersLiabilitiesNonrecourse>
                <PartnersLiabilitiesQNF>0</PartnersLiabilitiesQNF>
                <PartnersLiabilitiesRecourse>0</PartnersLiabilitiesRecourse>
                <CapitalAccountBeginning>1858311</CapitalAccountBeginning>
                <CapitalAccountIncrease>137711</CapitalAccountIncrease>
                <CapitalAccountWithdrawls>646011</CapitalAccountWithdrawls>
                <CapitalAccountEnding>1350011</CapitalAccountEnding>
                <CapitalAccountMethod>
                    <TaxBasis>X</TaxBasis>
                </CapitalAccountMethod>
                <PartnerStateRes>WY</PartnerStateRes>
                <ByApportionment>X</ByApportionment>
                <ApportionmentPercentage>0.0360504</ApportionmentPercentage>
            </PartnerInformation>
            <PartnersShare>
                <OrdinaryIncome>
                    <FederalAmount>111</FederalAmount>
                    <PerStateLaw>111</PerStateLaw>
                    <StateSourceNonRes>29</StateSourceNonRes>
                </OrdinaryIncome>
                <NetIncomeRentalRE/>
                <NetIncomeRentalNonRE>
                    <FederalAmount>700</FederalAmount>
                    <PerStateLaw>700</PerStateLaw>
                    <StateSourceNonRes>25</StateSourceNonRes>
                </NetIncomeRentalNonRE>
                <GuaranteedPymts/>
                <InterestIncome>
                    <FederalAmount>12</FederalAmount>
                    <PerStateLaw>12</PerStateLaw>
                </InterestIncome>
                <OrdinaryDividends/>
                <RoyaltyIncome/>
                <ShortTermCapGain>
                    <FederalAmount>3</FederalAmount>
                    <PerStateLaw>3</PerStateLaw>
                </ShortTermCapGain>
                <LongTermCapGain>
                    <FederalAmount>15</FederalAmount>
                    <PerStateLaw>15</PerStateLaw>
                    <StateSourceNonRes>1</StateSourceNonRes>
                </LongTermCapGain>
                <NetSection1231Gain>
                    <FederalAmount>475</FederalAmount>
                    <PerStateLaw>475</PerStateLaw>
                    <StateSourceNonRes>17</StateSourceNonRes>
                </NetSection1231Gain>
                <AttributableToSaleFarmAssets/>
                <OtherIncome>
                    <FederalAmount>-596</FederalAmount>
                    <PerStateLaw>-596</PerStateLaw>
                    <StateSourceNonRes>-21</StateSourceNonRes>
                    <Explanation>Other income</Explanation>
                </OtherIncome>
                <Sec179Deduction/>
                <OtherDeductions>
                    <FederalAmount>12</FederalAmount>
                    <PerStateLaw>12</PerStateLaw>
                    <StateSourceNonRes>0</StateSourceNonRes>
                    <Explanation>Total Other Deductions</Explanation>
                </OtherDeductions>
                <ForeignTransactions>
                    <FederalAmount>64338</FederalAmount>
                    <PerStateLaw>64338</PerStateLaw>
                    <StateSourceNonRes>0</StateSourceNonRes>
                    <Explanation>GrossIncomeFromAllSources</Explanation>
                </ForeignTransactions>
                <ForeignTransactions>
                    <FederalAmount>170</FederalAmount>
                    <PerStateLaw>170</PerStateLaw>
                    <StateSourceNonRes>0</StateSourceNonRes>
                    <Explanation>GeneralCategorySourcedAtPartnershipLevel</Explanation>
                </ForeignTransactions>
                <ForeignTransactions>
                    <FederalAmount>151</FederalAmount>
                    <PerStateLaw>151</PerStateLaw>
                    <StateSourceNonRes>0</StateSourceNonRes>
                    <Explanation>GeneralCategoryApportionedAtPartnerLevel</Explanation>
                </ForeignTransactions>
                <ForeignTransactions>
                    <FederalAmount>5</FederalAmount>
                    <PerStateLaw>5</PerStateLaw>
                    <StateSourceNonRes>0</StateSourceNonRes>
                    <Explanation>TotalForeignTaxes</Explanation>
                </ForeignTransactions>
                <AltMinTax>
                    <FederalAmount>480</FederalAmount>
                    <PerStateLaw>480</PerStateLaw>
                    <StateSourceNonRes>17</StateSourceNonRes>
                    <Explanation>Post 1986 depreciation adjustment</Explanation>
                </AltMinTax>
                <AltMinTax>
                    <FederalAmount>-636</FederalAmount>
                    <PerStateLaw>-636</PerStateLaw>
                    <StateSourceNonRes>-23</StateSourceNonRes>
                    <Explanation>Adjusted gain or loss</Explanation>
                </AltMinTax>
                <NondeductibleExpenses>
                    <FederalAmount>31</FederalAmount>
                    <PerStateLaw>31</PerStateLaw>
                </NondeductibleExpenses>
                <Distributions>
                    <DistSecurities>
                        <FederalAmount>6460</FederalAmount>
                        <PerStateLaw>6460</PerStateLaw>
                    </DistSecurities>
                </Distributions>
                <OtherInformation>
                    <InvestmentIncome>
                        <FederalAmount>12</FederalAmount>
                        <Adjustment>12</Adjustment>
                        <StateSourceNonRes>0</StateSourceNonRes>
                        <Explanation>Investment Income</Explanation>
                    </InvestmentIncome>
                </OtherInformation>
                <IncomeLossReconciliation>
                    <PerStateLaw>1413</PerStateLaw>
                    <StateSourceNonRes>51</StateSourceNonRes>
                </IncomeLossReconciliation>
                <GrossIncomeAllActivities/>
            </PartnersShare>
            <PartnersApportionmentFactors>
                <FirstFactor>
                    <FactorUsed>Property</FactorUsed>
                    <Wisconsin>0</Wisconsin>
                    <TotalCompany>0</TotalCompany>
                </FirstFactor>
                <SecondFactor>
                    <FactorUsed>Payroll</FactorUsed>
                    <Wisconsin>0</Wisconsin>
                    <TotalCompany>0</TotalCompany>
                </SecondFactor>
                <ThirdFactor>
                    <FactorUsed>Sales</FactorUsed>
                    <Wisconsin>0</Wisconsin>
                    <TotalCompany>0</TotalCompany>
                </ThirdFactor>
            </PartnersApportionmentFactors>
            <PartnersShareAddSub>
                <Additions>
                    <TotalAdditions>0</TotalAdditions>
                </Additions>
                <Subtractions>
                    <TotalSubtractions>0</TotalSubtractions>
                </Subtractions>
                <TotalAdjustment>0</TotalAdjustment>
            </PartnersShareAddSub>
        </Sch3K1>   
    </level1>   
</Top>

XQuery

for 
    $level1 at $currentlevel1Pos in if(exists(./x:top/x:level1)) then ./x:top/x:level1 else element{'level1'} {''},
    $Sch3K1 at $currentSch3K1Pos in if(exists(./x:top/x:level1/x:Sch3K1)) then ./x:top/x:level1/x:Sch3K1 else element{'Sch3K1'} {''},
    $PartnerInformation at $currentPartnerInformationPos in if(exists($Sch3K1/x:PartnerInformation)) then $Sch3K1/x:PartnerInformation else element{'PartnerInformation'} {''},
    $PartnersProfitBOY at $currentPartnersProfitBOYPos in if(exists($Sch3K1/x:PartnerInformation/x:PartnersProfitBOY)) then $Sch3K1/x:PartnerInformation/x:PartnersProfitBOY else element{'PartnersProfitBOY'} {''},
    $PartnersProfitEOY at $currentPartnersProfitEOYPos in if(exists($Sch3K1/x:PartnerInformation/x:PartnersProfitEOY)) then $Sch3K1/x:PartnerInformation/x:PartnersProfitEOY else element{'PartnersProfitEOY'} {''},
    $PartnersLossBOY at $currentPartnersLossBOYPos in if(exists($Sch3K1/x:PartnerInformation/x:PartnersLossBOY)) then $Sch3K1/x:PartnerInformation/x:PartnersLossBOY else element{'PartnersLossBOY'} {''},
    $PartnersLossEOY at $currentPartnersLossEOYPos in if(exists($Sch3K1/x:PartnerInformation/x:PartnersLossEOY)) then $Sch3K1/x:PartnerInformation/x:PartnersLossEOY else element{'PartnersLossEOY'} {''},
    $PartnersCapitalBOY at $currentPartnersCapitalBOYPos in if(exists($Sch3K1/x:PartnerInformation/x:PartnersCapitalBOY)) then $Sch3K1/x:PartnerInformation/x:PartnersCapitalBOY else element{'PartnersCapitalBOY'} {''},
    $PartnersCapitalEOY at $currentPartnersCapitalEOYPos in if(exists($Sch3K1/x:PartnerInformation/x:PartnersCapitalEOY)) then $Sch3K1/x:PartnerInformation/x:PartnersCapitalEOY else element{'PartnersCapitalEOY'} {''}

let $genlevel1 := false
let $genSch3K1 := false
let $prevSch3K1 := ./x:top/x:level1/x:Sch3K1[$currentSch3K1Pos+-1]
let $nextSch3K1 := ./x:top/x:level1/x:Sch3K1[$currentSch3K1Pos+1]
let $Sch3K1Count := count(./x:top/x:level1/x:Sch3K1)
let $genPartnerInformation := false
let $genPartnersProfitBOY := exists($Sch3K1/x:PartnerInformation/x:PartnersProfitBOY)
let $genPartnersProfitEOY := exists($Sch3K1/x:PartnerInformation/x:PartnersProfitEOY)
let $genPartnersLossBOY := exists($Sch3K1/x:PartnerInformation/x:PartnersLossBOY)
let $genPartnersLossEOY := exists($Sch3K1/x:PartnerInformation/x:PartnersLossEOY)
let $genPartnersCapitalBOY := exists($Sch3K1/x:PartnerInformation/x:PartnersCapitalBOY)
let $genPartnersCapitalEOY := exists($Sch3K1/x:PartnerInformation/x:PartnersCapitalEOY)
return 
<Evaluation>
    <FieldEntry>
            <Name>x:PartnersProfitBOY</Name>
            <Xpath>x:top/x:level1/x:Sch3K1/x:PartnerInformation/x:PartnersProfitBOY</Xpath>
            <Value>{$Sch3K1/x:PartnerInformation/x:PartnersProfitBOY/data()}</Value>
            <NextValue>{$nextSch3K1/x:PartnerInformation/x:PartnersProfitBOY/data()}</NextValue>
            <PrevValue>{$prevSch3K1/x:PartnerInformation/x:PartnersProfitBOY/data()}</PrevValue>
            <Index>{$currentSch3K1Pos}</Index>
            <Count>{$Sch3K1Count}</Count>
            <FieldKey>{$Sch3K1/x:PartnerInformation/x:PartnersProfitBOY/@FieldKey/data()}</FieldKey>
            <NodeIsPresent></NodeIsPresent>
            <HasChildNodes></HasChildNodes>
            </FieldEntry>
    <FieldEntry>
            <Name>x:PartnersProfitEOY</Name>
            <Xpath>x:top/x:level1/x:Sch3K1/x:PartnerInformation/x:PartnersProfitEOY</Xpath>
            <Value>{$Sch3K1/x:PartnerInformation/x:PartnersProfitEOY/data()}</Value>
            <NextValue>{$nextSch3K1/x:PartnerInformation/x:PartnersProfitEOY/data()}</NextValue>
            <PrevValue>{$prevSch3K1/x:PartnerInformation/x:PartnersProfitEOY/data()}</PrevValue>
            <Index>{$currentSch3K1Pos}</Index>
            <Count>{$Sch3K1Count}</Count>
            <FieldKey>{$Sch3K1/x:PartnerInformation/x:PartnersProfitEOY/@FieldKey/data()}</FieldKey>
            <NodeIsPresent></NodeIsPresent>
            <HasChildNodes></HasChildNodes>
            </FieldEntry>
    <FieldEntry>
            <Name>x:PartnersLossBOY</Name>
            <Xpath>x:top/x:level1/x:Sch3K1/x:PartnerInformation/x:PartnersLossBOY</Xpath>
            <Value>{$Sch3K1/x:PartnerInformation/x:PartnersLossBOY/data()}</Value>
            <NextValue>{$nextSch3K1/x:PartnerInformation/x:PartnersLossBOY/data()}</NextValue>
            <PrevValue>{$prevSch3K1/x:PartnerInformation/x:PartnersLossBOY/data()}</PrevValue>
            <Index>{$currentSch3K1Pos}</Index>
            <Count>{$Sch3K1Count}</Count>
            <FieldKey>{$Sch3K1/x:PartnerInformation/x:PartnersLossBOY/@FieldKey/data()}</FieldKey>
            <NodeIsPresent></NodeIsPresent>
            <HasChildNodes></HasChildNodes>
            </FieldEntry>
    <FieldEntry>
            <Name>x:PartnersLossEOY</Name>
            <Xpath>x:top/x:level1/x:Sch3K1/x:PartnerInformation/x:PartnersLossEOY</Xpath>
            <Value>{$Sch3K1/x:PartnerInformation/x:PartnersLossEOY/data()}</Value>
            <NextValue>{$nextSch3K1/x:PartnerInformation/x:PartnersLossEOY/data()}</NextValue>
            <PrevValue>{$prevSch3K1/x:PartnerInformation/x:PartnersLossEOY/data()}</PrevValue>
            <Index>{$currentSch3K1Pos}</Index>
            <Count>{$Sch3K1Count}</Count>
            <FieldKey>{$Sch3K1/x:PartnerInformation/x:PartnersLossEOY/@FieldKey/data()}</FieldKey>
            <NodeIsPresent></NodeIsPresent>
            <HasChildNodes></HasChildNodes>
            </FieldEntry>
    <FieldEntry>
            <Name>x:PartnersCapitalBOY</Name>
            <Xpath>x:top/x:level1/x:Sch3K1/x:PartnerInformation/x:PartnersCapitalBOY</Xpath>
            <Value>{$Sch3K1/x:PartnerInformation/x:PartnersCapitalBOY/data()}</Value>
            <NextValue>{$nextSch3K1/x:PartnerInformation/x:PartnersCapitalBOY/data()}</NextValue>
            <PrevValue>{$prevSch3K1/x:PartnerInformation/x:PartnersCapitalBOY/data()}</PrevValue>
            <Index>{$currentSch3K1Pos}</Index>
            <Count>{$Sch3K1Count}</Count>
            <FieldKey>{$Sch3K1/x:PartnerInformation/x:PartnersCapitalBOY/@FieldKey/data()}</FieldKey>
            <NodeIsPresent></NodeIsPresent>
            <HasChildNodes></HasChildNodes>
            </FieldEntry>
    <FieldEntry>
            <Name>x:PartnersCapitalEOY</Name>
            <Xpath>x:top/x:level1/x:Sch3K1/x:PartnerInformation/x:PartnersCapitalEOY</Xpath>
            <Value>{$Sch3K1/x:PartnerInformation/x:PartnersCapitalEOY/data()}</Value>
            <NextValue>{$nextSch3K1/x:PartnerInformation/x:PartnersCapitalEOY/data()}</NextValue>
            <PrevValue>{$prevSch3K1/x:PartnerInformation/x:PartnersCapitalEOY/data()}</PrevValue>
            <Index>{$currentSch3K1Pos}</Index>
            <Count>{$Sch3K1Count}</Count>
            <FieldKey>{$Sch3K1/x:PartnerInformation/x:PartnersCapitalEOY/@FieldKey/data()}</FieldKey>
            <NodeIsPresent></NodeIsPresent>
            <HasChildNodes></HasChildNodes>
            </FieldEntry>
</Evaluation>

首先,这里有一些异常。

  • 查询无法编译,因为它使用了尚未声明的命名空间前缀 "x"。 (但源文档似乎没有使用命名空间)
  • 查询将顶级元素引用为 x:top 但在源文档中它是 Top
  • 一些变量绑定到 false,而 false() 确实是有意的(Saxon 对此发出警告)。

其次,有很多声明的变量没有被使用。例如,$PartnersCapitalBOY$genPartnersCapitalBOY。原则上,优化器很容易忽略未使用的变量,但是给优化器做不必要的工作并不总是一个好主意,因为它会分散它的注意力,无法找到优化可以产生真正差异的模式。

第三,我对结构的重复使用表示怀疑:

(for) $PartnerInformation at $currentPartnerInformationPos 
 in if(exists($Sch3K1/x:PartnerInformation)) 
    then $Sch3K1/x:PartnerInformation 
    else element{'PartnerInformation'} {''},

这里的问题是创建新元素的构造不能移出循环,因为 XQuery 非常挑剔这样的构造每次执行时都必须创建不同的元素这一事实。所以(没有实际检查优化器详细做了什么)我怀疑这个结构抑制了可能的优化。

四、条款:

let $prevSch3K1 := ./x:top/x:level1/x:Sch3K1[$currentSch3K1Pos+-1]
let $nextSch3K1 := ./x:top/x:level1/x:Sch3K1[$currentSch3K1Pos+1]
如果 ./x:top/x:level1/x:Sch3K1 绑定到全局变量,

可能会更有效率。

乍一看,您的查询非常可怕,有 9 个嵌套循环,每个循环迭代超过 80K 个元素:一个天真的实现会执行最里面的代码大约 10^45 次,所以如果最里面的代码需要一纳秒来执行,总查询需要 10^36 秒,考虑到宇宙的年龄小于 10^16 秒,这是相当长的时间。因此,如果这是 运行 一小时后,优化器做得很好。

它能够做得如此出色的唯一原因是,如此多的查询显然毫无意义。

查看优化器跟踪 (-explain) 实际上,我很惊讶执行的优化很少,我怀疑造成这种情况的主要原因是 "for" 子句中间的元素构造函数。

我将从简化查询开始:

  1. 消除所有未使用的变量
  2. 如果您确实需要创建虚拟元素以实现外连接,请将这些虚拟元素作为全局变量创建一次,而不是在循环中重复创建它们。

有了这些变化,逻辑可能会变得更清晰。我认为本质上,它实际上是一个非常简单的查询。

根据 Michael Kay 的建议,我更改了 FLWOR 语句,将全局变量用于构造和一些变量赋值。 return 语句没有变化,也没有包含在下面。当我 运行 Query.exe 时,需要 21 分钟进行更改,而 return 结果需要 24 分钟。有轻微的改善。将结果保存到文件中是 150 MB ...所以我遗漏了什么。谢谢

let $docxml := doc("p.xml")
let $gSch3K1 := $docxml/Top/level1/Sch3K1
let $glevel1Element := element{'level1'} {''}
let $gSch3K1Element := element{'Sch3K1'} {''}
let $gPartnerInformationElement := element{'PartnerInformation'} {''}


for     
    $level1 at $currentlevel1Pos in if(exists($docxml/Top/level1)) then $docxml/Top/level1 else $glevel1Element,
    $Sch3K1 at $currentSch3K1Pos in if(exists($docxml/Top/level1/Sch3K1)) then $docxml/Top/level1/Sch3K1 else $gSch3K1Element,
    $PartnerInformation at $currentPartnerInformationPos in if(exists($Sch3K1/PartnerInformation)) then $Sch3K1/PartnerInformation else $gPartnerInformationElement

let $prevSch3K1 := $gSch3K1[$currentSch3K1Pos+-1]
let $nextSch3K1 := $gSch3K1[$currentSch3K1Pos+1]
let $Sch3K1Count := count($docxml/Top/level1/Sch3K1)

return
---