XML CharacterDataHandler 回调被多次调用
XML CharacterDataHandler callback unpextedly called multiple times
我正在学习 libexpat。我使用 API:
拼凑了这个示例,以便基本熟悉
代码:
#include <stdio.h>
#include <expat.h>
#include <string.h>
#include <iostream>
void start(void* userData, const char* name, const char* argv[])
{
std::cout << "name: " << name << std::endl;
int i = 0;
while (argv[i])
{
std::cout << "argv[" << i << "] == " << argv[i++] << std::endl;
}
}
void end(void* userData, const char* name)
{
}
void value(void* userData, const char* val, int len)
{
char str[len+1];
strncpy(str, val, len);
str[len] = '[=11=]';
std::cout << "value: " << str << std::endl;
}
int main(int argc, char* argv[], char* envz[])
{
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start, end);
XML_SetCharacterDataHandler(parser, value);
int bytesRead = 0;
char val[1024] = {};
FILE* fp = fopen("./catalog.xml", "r");
std::cout << "fp == 0x" << (void*)fp << std::endl;
do
{
bytesRead = fread(val, 1, sizeof(val), fp);
std::cout << "In while loop bytesRead==" << bytesRead << std::endl;
if (0 == XML_Parse(parser, val, bytesRead, (bytesRead < sizeof(val))))
{
break;
}
}
while (1);
XML_ParserFree(parser);
std::cout << __FUNCTION__ << " end" << std::endl;
return 0;
}
catalog.xml:
<CATALOG>
<CD key1="value1" key2="value2">
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<YEAR>1995</YEAR>
</CD>
</CATALOG>
生成文件:
xml: xml.o
g++ xml.o -lexpat -o xml
xml.o: main.cpp Makefile
g++ -g -c main.cpp -o xml.o
输出:
fp == 0x0x22beb50
In while loop bytesRead==148
name: CATALOG
value:
value:
name: CD
argv[1] == key1
argv[2] == value1
argv[3] == key2
argv[4] == value2
value:
value:
name: TITLE
value: Empire Burlesque
value:
value:
name: ARTIST
value: Bob Dylan
value:
value:
name: YEAR
value: 1995
value:
value:
value:
In while loop bytesRead==0
main end
问题:
从输出来看,我使用 XML_SetCharacterDataHandler()
安装的回调似乎为 CATALOG、CD、TITLE 和 ARTIST xml 标签调用了两次,然后为 YEAR 调用了多次标签 - 有人可以解释这种行为吗?从提到的 catalog.xml
,我不清楚为什么有(或永远会有)多个值与任何 XML 标签相关联。
谢谢。
引用:
感谢 this site 以上示例代码的基础。
expat
解析器 可能 将文本节点拆分为对字符数据处理程序的多个调用。要正确处理文本节点,您必须通过多次调用累积文本,并在接收到包含标签的 "end" 事件时对其进行处理。
这在一般情况下是正确的,即使在不同的解析器和不同的语言中也是如此——即在 Java 中也是如此。
见实例http://marcomaggi.github.io/docs/expat.html#using-comm
A common first–time mistake with any of the event–oriented interfaces to an XML parser is to expect all the text contained in an element to be reported by a single call to the character data handler. Expat, like many other XML parsers, reports such data as a sequence of calls; there's no way to know when the end of the sequence is reached until a different callback is made.
A single block of contiguous text free of markup may still result in a sequence of calls to this handler. In other words, if you're searching for a pattern in the text, it may be split across calls to this handler.
我正在学习 libexpat。我使用 API:
拼凑了这个示例,以便基本熟悉代码:
#include <stdio.h>
#include <expat.h>
#include <string.h>
#include <iostream>
void start(void* userData, const char* name, const char* argv[])
{
std::cout << "name: " << name << std::endl;
int i = 0;
while (argv[i])
{
std::cout << "argv[" << i << "] == " << argv[i++] << std::endl;
}
}
void end(void* userData, const char* name)
{
}
void value(void* userData, const char* val, int len)
{
char str[len+1];
strncpy(str, val, len);
str[len] = '[=11=]';
std::cout << "value: " << str << std::endl;
}
int main(int argc, char* argv[], char* envz[])
{
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start, end);
XML_SetCharacterDataHandler(parser, value);
int bytesRead = 0;
char val[1024] = {};
FILE* fp = fopen("./catalog.xml", "r");
std::cout << "fp == 0x" << (void*)fp << std::endl;
do
{
bytesRead = fread(val, 1, sizeof(val), fp);
std::cout << "In while loop bytesRead==" << bytesRead << std::endl;
if (0 == XML_Parse(parser, val, bytesRead, (bytesRead < sizeof(val))))
{
break;
}
}
while (1);
XML_ParserFree(parser);
std::cout << __FUNCTION__ << " end" << std::endl;
return 0;
}
catalog.xml:
<CATALOG>
<CD key1="value1" key2="value2">
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<YEAR>1995</YEAR>
</CD>
</CATALOG>
生成文件:
xml: xml.o
g++ xml.o -lexpat -o xml
xml.o: main.cpp Makefile
g++ -g -c main.cpp -o xml.o
输出:
fp == 0x0x22beb50
In while loop bytesRead==148
name: CATALOG
value:
value:
name: CD
argv[1] == key1
argv[2] == value1
argv[3] == key2
argv[4] == value2
value:
value:
name: TITLE
value: Empire Burlesque
value:
value:
name: ARTIST
value: Bob Dylan
value:
value:
name: YEAR
value: 1995
value:
value:
value:
In while loop bytesRead==0
main end
问题:
从输出来看,我使用 XML_SetCharacterDataHandler()
安装的回调似乎为 CATALOG、CD、TITLE 和 ARTIST xml 标签调用了两次,然后为 YEAR 调用了多次标签 - 有人可以解释这种行为吗?从提到的 catalog.xml
,我不清楚为什么有(或永远会有)多个值与任何 XML 标签相关联。
谢谢。
引用:
感谢 this site 以上示例代码的基础。
expat
解析器 可能 将文本节点拆分为对字符数据处理程序的多个调用。要正确处理文本节点,您必须通过多次调用累积文本,并在接收到包含标签的 "end" 事件时对其进行处理。
这在一般情况下是正确的,即使在不同的解析器和不同的语言中也是如此——即在 Java 中也是如此。
见实例http://marcomaggi.github.io/docs/expat.html#using-comm
A common first–time mistake with any of the event–oriented interfaces to an XML parser is to expect all the text contained in an element to be reported by a single call to the character data handler. Expat, like many other XML parsers, reports such data as a sequence of calls; there's no way to know when the end of the sequence is reached until a different callback is made.
A single block of contiguous text free of markup may still result in a sequence of calls to this handler. In other words, if you're searching for a pattern in the text, it may be split across calls to this handler.