.getTextContent returns 来自子元素的文本
.getTextContent returns text from child elements too
我正在尝试让 XML 解析下来(是的,我知道有更简单的方法来 parse/validate,比如 xstream),但我似乎无法获取文本内容只是一个元素。例如:
<container>
<element0>textThatIWant</element0> //only returned by .getTextContent
<element1>
<subelement0>textThatIDontWant</subelement0> //but also returned by
<subelement1>textThatIDontWant</subelement1> //.getTextContent
</element1>
<container>
我正在将结果输出到控制台并获得大部分我正在寻找的内容,但我似乎获得文本字符串的唯一方法是 .getTextContent()
其中 returns 所有文本子元素也没有空格(否则我会在空格上拆分)或 .getNodeValue().toString()
抛出 nullPointerExceptions
。 @Jihar 提到了类似 .getTextValue()
的内容,但 Eclipse 无法识别它(也许我可以 implement/inherit/whatever 添加功能),有什么帮助吗?
这是我使用的代码:
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.*;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
StringBuilder xmlStringBuilder = new StringBuilder();
String appendage = "..." //This string holds the xml formatted data I'll be
//using in a long annoying line, I'll include it
//separately for clarity
xmlStringBuilder.append(appendage);
ByteArrayInputStream input = new ByteArrayInputStream(xmlStringBuilder.toString().getBytes("UTF-8"));
System.out.println("Test Results:");
System.out.println();
Document doc = builder.parse(input);
Element root = doc.getDocumentElement();
NodeList children = root.getChildNodes();
System.out.println(root.getTagName());
System.out.println();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
if (child instanceof Element) {
Element childElement = (Element) child;
System.out.println(childElement.getTagName() + " " + childElement);
NodeList grandChildren = child.getChildNodes();
for (int x = 0; x < grandChildren.getLength(); x++) {
Node grandChild = grandChildren.item(x);
if (grandChild instanceof Element) {
Element grandChildElement = (Element) grandChild;
System.out.print("\t" + grandChildElement.getTagName() + ":\t");
NodeList greatGrandChildren = grandChild.getChildNodes();
for (int y = 0; y < greatGrandChildren.getLength(); y++) {
Node greatGrandChild = greatGrandChildren.item(y);
if (greatGrandChild instanceof Element) {
Element greatGrandChildElement = (Element) greatGrandChild;
System.out.print(" " + greatGrandChildElement.getTextContent());
if ( y < greatGrandChildren.getLength() - 1) { System.out.print(","); } }
}
System.out.println();
}
}
}
}
}
}
这里是完整的附加变量:
String appendage = "<?xml version=\"1.0\"?><branch0><name>business</name><taxINFO/><personnel><executives><name>Billy Bob</name><name>Colonel Jessup</name></executives><managerial/><operations><name>sabrina</name><name>lisa</name></operations><services><name>jamie</name><name>justin</name><name>forest</name></services></personnel><regions><ebay><area>OK</area><area>BE</area><area>EV</area><area>WC</area></ebay><sbay><area>SJ</area><area>MP</area><area>SV</area><area>MV</area></sbay><S.F.><area>SF</area></S.F.><N.Y.><area>NY</area></N.Y.><S.CA><area>SD</area><area>LA</area></S.CA></regions><products/><services/></branch0>";
或:
String appendage = "
<?xml version=\"1.0\"?>
<branch0>
<name>business</name>
<taxINFO/>
<personnel>
<executives>
<name>Billy Bob</name>
<name>Colonel Jessup</name>
</executives>
<managerial/>
<operations>
<name>sabrina</name>
<name>lisa</name>
</operations>
<services>
<name>jamie</name>
<name>justin</name>
<name>forest</name>
</services>
</personnel>
<regions>
<ebay>
<area>OK</area>
<area>BE</area>
<area>EV</area>
<area>WC</area>
</ebay>
<sbay>
<area>SJ</area>
<area>MP</area>
<area>SV</area>
<area>MV</area>
</sbay>
<S.F.>
<area>SF</area>
</S.F.>
<N.Y.>
<area>NY</area>
</N.Y.>
<S.CA>
<area>SD</area>
<area>LA</area>
</S.CA>
</regions>
<products/>
<services/>
</branch0>";
";
最后我的控制台输出(你会看到它是 [name: null]
我希望它说 [name: business]
或什至只是 business
; 但不是包含子元素数据 w/out 空格):
Test Results:
branch0
name [name: null]
taxINFO [taxINFO: null]
personnel [personnel: null]
executives: Billy Bob, Colonel Jessup
managerial:
operations: sabrina, lisa
services: jamie, justin, forest
regions [regions: null]
ebay: OK, BE, EV, WC
sbay: SJ, MP, SV, MV
S.F.: SF
N.Y.: NY
S.CA: SD, LA
products [products: null]
services [services: null]
这是我使用 .getTextContent
:
的控制台输出
Test Results:
business
branch0
name business
taxINFO
personnel Billy BobColonel Jessupsabrinalisajamiejustinforest
executives: Billy Bob, Colonel Jessup
managerial:
operations: sabrina, lisa
services: jamie, justin, forest
regions OKBEEVWCSJMPSVMVSFNYSDLA
ebay: OK, BE, EV, WC
sbay: SJ, MP, SV, MV
S.F.: SF
N.Y.: NY
S.CA: SD, LA
products
services
System.out.println(childElement.getTagName() + " " + childElement);
应该是(如你所知!)
System.out.println(childElement.getTagName() + " "
+ childElement.getTextContent());
因此,出于我的目的,我能够使用 XPath 获得我正在寻找的各个元素:
XPathFactory xpfactory = XPathFactory.newInstance();
XPath path = xpfactory.newXPath();
try {
String aString = path.evaluate("/branch0/name", doc);
System.out.println(aString);
} catch (XPathExpressionException e) { e.printStackTrace(); }
当然,这需要预先了解结构,但由于我可以使用 XML 架构进行验证,而且我的文档没有太 complicated/heavily 嵌套,所以我认为这不会对我来说是个问题。当我完成当前项目的工作后,我将尝试查找 post 关于遍历子节点和检查文本节点的链接(如@Ian Roberts 所建议的),但我对 XML 现在就去做。
我正在尝试让 XML 解析下来(是的,我知道有更简单的方法来 parse/validate,比如 xstream),但我似乎无法获取文本内容只是一个元素。例如:
<container>
<element0>textThatIWant</element0> //only returned by .getTextContent
<element1>
<subelement0>textThatIDontWant</subelement0> //but also returned by
<subelement1>textThatIDontWant</subelement1> //.getTextContent
</element1>
<container>
我正在将结果输出到控制台并获得大部分我正在寻找的内容,但我似乎获得文本字符串的唯一方法是 .getTextContent()
其中 returns 所有文本子元素也没有空格(否则我会在空格上拆分)或 .getNodeValue().toString()
抛出 nullPointerExceptions
。 @Jihar 提到了类似 .getTextValue()
的内容,但 Eclipse 无法识别它(也许我可以 implement/inherit/whatever 添加功能),有什么帮助吗?
这是我使用的代码:
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.*;
public class Test {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
StringBuilder xmlStringBuilder = new StringBuilder();
String appendage = "..." //This string holds the xml formatted data I'll be
//using in a long annoying line, I'll include it
//separately for clarity
xmlStringBuilder.append(appendage);
ByteArrayInputStream input = new ByteArrayInputStream(xmlStringBuilder.toString().getBytes("UTF-8"));
System.out.println("Test Results:");
System.out.println();
Document doc = builder.parse(input);
Element root = doc.getDocumentElement();
NodeList children = root.getChildNodes();
System.out.println(root.getTagName());
System.out.println();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
if (child instanceof Element) {
Element childElement = (Element) child;
System.out.println(childElement.getTagName() + " " + childElement);
NodeList grandChildren = child.getChildNodes();
for (int x = 0; x < grandChildren.getLength(); x++) {
Node grandChild = grandChildren.item(x);
if (grandChild instanceof Element) {
Element grandChildElement = (Element) grandChild;
System.out.print("\t" + grandChildElement.getTagName() + ":\t");
NodeList greatGrandChildren = grandChild.getChildNodes();
for (int y = 0; y < greatGrandChildren.getLength(); y++) {
Node greatGrandChild = greatGrandChildren.item(y);
if (greatGrandChild instanceof Element) {
Element greatGrandChildElement = (Element) greatGrandChild;
System.out.print(" " + greatGrandChildElement.getTextContent());
if ( y < greatGrandChildren.getLength() - 1) { System.out.print(","); } }
}
System.out.println();
}
}
}
}
}
}
这里是完整的附加变量:
String appendage = "<?xml version=\"1.0\"?><branch0><name>business</name><taxINFO/><personnel><executives><name>Billy Bob</name><name>Colonel Jessup</name></executives><managerial/><operations><name>sabrina</name><name>lisa</name></operations><services><name>jamie</name><name>justin</name><name>forest</name></services></personnel><regions><ebay><area>OK</area><area>BE</area><area>EV</area><area>WC</area></ebay><sbay><area>SJ</area><area>MP</area><area>SV</area><area>MV</area></sbay><S.F.><area>SF</area></S.F.><N.Y.><area>NY</area></N.Y.><S.CA><area>SD</area><area>LA</area></S.CA></regions><products/><services/></branch0>";
或:
String appendage = "
<?xml version=\"1.0\"?>
<branch0>
<name>business</name>
<taxINFO/>
<personnel>
<executives>
<name>Billy Bob</name>
<name>Colonel Jessup</name>
</executives>
<managerial/>
<operations>
<name>sabrina</name>
<name>lisa</name>
</operations>
<services>
<name>jamie</name>
<name>justin</name>
<name>forest</name>
</services>
</personnel>
<regions>
<ebay>
<area>OK</area>
<area>BE</area>
<area>EV</area>
<area>WC</area>
</ebay>
<sbay>
<area>SJ</area>
<area>MP</area>
<area>SV</area>
<area>MV</area>
</sbay>
<S.F.>
<area>SF</area>
</S.F.>
<N.Y.>
<area>NY</area>
</N.Y.>
<S.CA>
<area>SD</area>
<area>LA</area>
</S.CA>
</regions>
<products/>
<services/>
</branch0>";
";
最后我的控制台输出(你会看到它是 [name: null]
我希望它说 [name: business]
或什至只是 business
; 但不是包含子元素数据 w/out 空格):
Test Results:
branch0
name [name: null]
taxINFO [taxINFO: null]
personnel [personnel: null]
executives: Billy Bob, Colonel Jessup
managerial:
operations: sabrina, lisa
services: jamie, justin, forest
regions [regions: null]
ebay: OK, BE, EV, WC
sbay: SJ, MP, SV, MV
S.F.: SF
N.Y.: NY
S.CA: SD, LA
products [products: null]
services [services: null]
这是我使用 .getTextContent
:
Test Results:
business
branch0
name business
taxINFO
personnel Billy BobColonel Jessupsabrinalisajamiejustinforest
executives: Billy Bob, Colonel Jessup
managerial:
operations: sabrina, lisa
services: jamie, justin, forest
regions OKBEEVWCSJMPSVMVSFNYSDLA
ebay: OK, BE, EV, WC
sbay: SJ, MP, SV, MV
S.F.: SF
N.Y.: NY
S.CA: SD, LA
products
services
System.out.println(childElement.getTagName() + " " + childElement);
应该是(如你所知!)
System.out.println(childElement.getTagName() + " "
+ childElement.getTextContent());
因此,出于我的目的,我能够使用 XPath 获得我正在寻找的各个元素:
XPathFactory xpfactory = XPathFactory.newInstance();
XPath path = xpfactory.newXPath();
try {
String aString = path.evaluate("/branch0/name", doc);
System.out.println(aString);
} catch (XPathExpressionException e) { e.printStackTrace(); }
当然,这需要预先了解结构,但由于我可以使用 XML 架构进行验证,而且我的文档没有太 complicated/heavily 嵌套,所以我认为这不会对我来说是个问题。当我完成当前项目的工作后,我将尝试查找 post 关于遍历子节点和检查文本节点的链接(如@Ian Roberts 所建议的),但我对 XML 现在就去做。