LibXML: "xmlns" 属性被报告但不在 XML 输入文件中
LibXML: "xmlns" attribute being reported but not in XML input file
我有以下 XML 文件 sheetX.xml
(取自 Excel XML sheet 文件):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
mc:Ignorable="x14ac xr xr2 xr3"
xr:uid="{109BF357-4A9A-4969-B57D-8A2B0130DC3F}">
<dimension ref="A1"/>
<sheetViews>
<sheetView tabSelected="1" topLeftCell="M1" workbookViewId="0">
<selection activeCell="A1" sqref="A1"/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<sheetData/>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>
我正在使用 XML::LibXML Perl 模块读取文件
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new( location => sheetX.xml);
$reader->read();
while($NERROR1==0){
my $doc = $reader->copyCurrentNode(1);
if(!defined $doc){
$NERROR1=-1;
} else {
if($reader->attributeCount()>0){
print "tag name:" . $reader->name() . "\n";
my @attributelist = $doc->attributes();
for my $iAtt (0 .. scalar @attributelist-1){
print "Att name:" . $attributelist[$iAtt]->nodeName() . "\n";
print "Att value:" . $attributelist[$iAtt]->value . "\n";
}
}
$reader->nextElement();
}
}
$reader->close();
perl 模块中一些标签的输出是:
tag name:worksheet
Att name:mc:Ignorable
Att value:x14ac xr xr2 xr3
Att name:xr:uid
Att value:{00000000-0001-0000-0400-000000000000}
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:mc
Att value:http://schemas.openxmlformats.org/markup-compatibility/2006
Att name:xmlns:r
Att value:http://schemas.openxmlformats.org/officeDocument/2006/relationships
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
Att name:xmlns:xr
Att value:http://schemas.microsoft.com/office/spreadsheetml/2014/revision
Att name:xmlns:xr2
Att value:http://schemas.microsoft.com/office/spreadsheetml/2015/revision2
Att name:xmlns:xr3
Att value:http://schemas.microsoft.com/office/spreadsheetml/2016/revision3
和
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
和
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
所以,基本上,代码打印出 xmlns
属性,这些属性在 XML 文件中没有显示 sheetView
和 sheetFormatPr
标签,但是 worksheet
标签具有文件中显示的所有属性,没有额外的属性。
在某个阶段,我需要根据我的 perl 程序生成的数据重建 XML 文件(该程序还打印出标签、值等)。所以我的问题是:有什么方法可以让我的 perl 程序打印出 XML 文件中显示的标签,而不是其他未显示的标签?
这是我所知道的排除 xmlns
属性的最小更改集。请注意标记为 ###
的已更改行。我不确定您的其他代码可能对 $NERROR1
做了什么。为了简单起见,我在这里删除了它。其中大部分改编自 docs.
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new( location => 'foo.xml' );
$reader->read();
###my $NERROR1; # Needed to add this because of `use strict`
###while($NERROR1==0){
while($reader->read) { ### Per the docs.
my $node = $reader->copyCurrentNode(1); ### Might not be a document, so $node instead of $doc
### if(!defined $doc){
### $NERROR1=-1;
### } else {
if($reader->attributeCount>0){
print "tag name:" . $reader->name . "\n";
### my @attributelist = $doc->attributes();
### for my $iAtt (0 .. scalar @attributelist-1){
for my $att ($node->attributes) { ### Simpler form of the loop --- don't need the indices.
next if $att->nodeName =~ /^xmlns\b/; ### <== The key - skip to the next attribute if this one starts with "xmlns"
print "Att name:" . $att->nodeName . "\n";
print "Att value:" . $att->value . "\n";
}
}
### $reader->nextElement();
### }
}
$reader->close();
输出
tag name:dimension
Att name:ref
Att value:A1
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
tag name:selection
Att name:activeCell
Att value:A1
Att name:sqref
Att value:A1
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
tag name:pageMargins
Att name:left
Att value:0.7
Att name:right
Att value:0.7
Att name:top
Att value:0.75
Att name:bottom
Att value:0.75
Att name:header
Att value:0.3
Att name:footer
Att value:0.3
说明
我找到了一个PerlMonks thread that links to RFC 4918, p. 40,它阐明了
Since the "xmlns" attribute does not contain a prefix, the namespace applies by default to all enclosed elements.
在这种情况下,<worksheet>
标记声明了默认命名空间 xmlns="http://schemas...2006/main"
。这适用于包含的元素,因此 <worksheet>
内的 <sheetView>
和 <sheetFormatPr>
标记也具有该默认命名空间。 XML::LibXML::Reader 通过在这些节点上报告 xmlns
属性,让您可以访问该信息。
我有以下 XML 文件 sheetX.xml
(取自 Excel XML sheet 文件):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"
xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision"
xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"
xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3"
mc:Ignorable="x14ac xr xr2 xr3"
xr:uid="{109BF357-4A9A-4969-B57D-8A2B0130DC3F}">
<dimension ref="A1"/>
<sheetViews>
<sheetView tabSelected="1" topLeftCell="M1" workbookViewId="0">
<selection activeCell="A1" sqref="A1"/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<sheetData/>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>
我正在使用 XML::LibXML Perl 模块读取文件
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new( location => sheetX.xml);
$reader->read();
while($NERROR1==0){
my $doc = $reader->copyCurrentNode(1);
if(!defined $doc){
$NERROR1=-1;
} else {
if($reader->attributeCount()>0){
print "tag name:" . $reader->name() . "\n";
my @attributelist = $doc->attributes();
for my $iAtt (0 .. scalar @attributelist-1){
print "Att name:" . $attributelist[$iAtt]->nodeName() . "\n";
print "Att value:" . $attributelist[$iAtt]->value . "\n";
}
}
$reader->nextElement();
}
}
$reader->close();
perl 模块中一些标签的输出是:
tag name:worksheet
Att name:mc:Ignorable
Att value:x14ac xr xr2 xr3
Att name:xr:uid
Att value:{00000000-0001-0000-0400-000000000000}
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:mc
Att value:http://schemas.openxmlformats.org/markup-compatibility/2006
Att name:xmlns:r
Att value:http://schemas.openxmlformats.org/officeDocument/2006/relationships
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
Att name:xmlns:xr
Att value:http://schemas.microsoft.com/office/spreadsheetml/2014/revision
Att name:xmlns:xr2
Att value:http://schemas.microsoft.com/office/spreadsheetml/2015/revision2
Att name:xmlns:xr3
Att value:http://schemas.microsoft.com/office/spreadsheetml/2016/revision3
和
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
和
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
Att name:xmlns
Att value:http://schemas.openxmlformats.org/spreadsheetml/2006/main
Att name:xmlns:x14ac
Att value:http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac
所以,基本上,代码打印出 xmlns
属性,这些属性在 XML 文件中没有显示 sheetView
和 sheetFormatPr
标签,但是 worksheet
标签具有文件中显示的所有属性,没有额外的属性。
在某个阶段,我需要根据我的 perl 程序生成的数据重建 XML 文件(该程序还打印出标签、值等)。所以我的问题是:有什么方法可以让我的 perl 程序打印出 XML 文件中显示的标签,而不是其他未显示的标签?
这是我所知道的排除 xmlns
属性的最小更改集。请注意标记为 ###
的已更改行。我不确定您的其他代码可能对 $NERROR1
做了什么。为了简单起见,我在这里删除了它。其中大部分改编自 docs.
use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new( location => 'foo.xml' );
$reader->read();
###my $NERROR1; # Needed to add this because of `use strict`
###while($NERROR1==0){
while($reader->read) { ### Per the docs.
my $node = $reader->copyCurrentNode(1); ### Might not be a document, so $node instead of $doc
### if(!defined $doc){
### $NERROR1=-1;
### } else {
if($reader->attributeCount>0){
print "tag name:" . $reader->name . "\n";
### my @attributelist = $doc->attributes();
### for my $iAtt (0 .. scalar @attributelist-1){
for my $att ($node->attributes) { ### Simpler form of the loop --- don't need the indices.
next if $att->nodeName =~ /^xmlns\b/; ### <== The key - skip to the next attribute if this one starts with "xmlns"
print "Att name:" . $att->nodeName . "\n";
print "Att value:" . $att->value . "\n";
}
}
### $reader->nextElement();
### }
}
$reader->close();
输出
tag name:dimension
Att name:ref
Att value:A1
tag name:sheetView
Att name:tabSelected
Att value:1
Att name:topLeftCell
Att value:M1
Att name:workbookViewId
Att value:0
tag name:selection
Att name:activeCell
Att value:A1
Att name:sqref
Att value:A1
tag name:sheetFormatPr
Att name:defaultRowHeight
Att value:15
Att name:x14ac:dyDescent
Att value:0.25
tag name:pageMargins
Att name:left
Att value:0.7
Att name:right
Att value:0.7
Att name:top
Att value:0.75
Att name:bottom
Att value:0.75
Att name:header
Att value:0.3
Att name:footer
Att value:0.3
说明
我找到了一个PerlMonks thread that links to RFC 4918, p. 40,它阐明了
Since the "xmlns" attribute does not contain a prefix, the namespace applies by default to all enclosed elements.
在这种情况下,<worksheet>
标记声明了默认命名空间 xmlns="http://schemas...2006/main"
。这适用于包含的元素,因此 <worksheet>
内的 <sheetView>
和 <sheetFormatPr>
标记也具有该默认命名空间。 XML::LibXML::Reader 通过在这些节点上报告 xmlns
属性,让您可以访问该信息。