.getTextContent returns 来自子元素的文本

.getTextContent returns text from child elements too

我正在尝试让 XML 解析下来(是的,我知道有更简单的方法来 parse/validate,比如 xstream),但我似乎无法获取文本内容只是一个元素。例如:

<container>
   <element0>textThatIWant</element0> //only returned by .getTextContent
   <element1>
      <subelement0>textThatIDontWant</subelement0> //but also returned by
      <subelement1>textThatIDontWant</subelement1> //.getTextContent
   </element1>
<container>

我正在将结果输出到控制台并获得大部分我正在寻找的内容,但我似乎获得文本字符串的唯一方法是 .getTextContent() 其中 returns 所有文本子元素也没有空格(否则我会在空格上拆分)或 .getNodeValue().toString() 抛出 nullPointerExceptions。 @Jihar 提到了类似 .getTextValue() 的内容,但 Eclipse 无法识别它(也许我可以 implement/inherit/whatever 添加功能),有什么帮助吗?

这是我使用的代码:

import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
import java.io.*;

public class Test {
   public static void main(String[] args) throws  ParserConfigurationException, SAXException, IOException {
      DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
      DocumentBuilder builder = factory.newDocumentBuilder();
      StringBuilder xmlStringBuilder = new StringBuilder();
      String appendage = "..." //This string holds the xml formatted data I'll be 
                               //using in a long annoying line, I'll include it 
                               //separately for clarity
      xmlStringBuilder.append(appendage);   
      ByteArrayInputStream input = new ByteArrayInputStream(xmlStringBuilder.toString().getBytes("UTF-8"));

      System.out.println("Test Results:");
      System.out.println();

      Document doc = builder.parse(input);
      Element root = doc.getDocumentElement();
      NodeList children = root.getChildNodes();

      System.out.println(root.getTagName());
      System.out.println();

      for (int i = 0; i < children.getLength(); i++) {
         Node child = children.item(i);
         if (child instanceof Element) { 
            Element childElement = (Element) child; 
            System.out.println(childElement.getTagName() + " " + childElement);

            NodeList grandChildren = child.getChildNodes();
            for (int x = 0; x < grandChildren.getLength(); x++) {
               Node grandChild = grandChildren.item(x);
               if (grandChild instanceof Element) {
                  Element grandChildElement = (Element) grandChild;
                  System.out.print("\t" + grandChildElement.getTagName() + ":\t");

                  NodeList greatGrandChildren = grandChild.getChildNodes();
                  for (int y = 0; y < greatGrandChildren.getLength(); y++) {
                     Node greatGrandChild = greatGrandChildren.item(y);
                     if (greatGrandChild instanceof Element) {
                        Element greatGrandChildElement = (Element) greatGrandChild;
                        System.out.print(" " + greatGrandChildElement.getTextContent());
                        if ( y < greatGrandChildren.getLength() - 1) { System.out.print(","); } }
                     }
                     System.out.println();
               }
            }
         }
      }
   }
}

这里是完整的附加变量:

String appendage = "<?xml version=\"1.0\"?><branch0><name>business</name><taxINFO/><personnel><executives><name>Billy Bob</name><name>Colonel Jessup</name></executives><managerial/><operations><name>sabrina</name><name>lisa</name></operations><services><name>jamie</name><name>justin</name><name>forest</name></services></personnel><regions><ebay><area>OK</area><area>BE</area><area>EV</area><area>WC</area></ebay><sbay><area>SJ</area><area>MP</area><area>SV</area><area>MV</area></sbay><S.F.><area>SF</area></S.F.><N.Y.><area>NY</area></N.Y.><S.CA><area>SD</area><area>LA</area></S.CA></regions><products/><services/></branch0>";

或:

String appendage = "
<?xml version=\"1.0\"?>
<branch0>
   <name>business</name>
   <taxINFO/>
   <personnel>
      <executives>
         <name>Billy Bob</name>
         <name>Colonel Jessup</name>
      </executives>
   <managerial/>
   <operations>
      <name>sabrina</name>
      <name>lisa</name>
   </operations>
   <services>
      <name>jamie</name>
      <name>justin</name>
      <name>forest</name>
   </services>
   </personnel>
   <regions>
      <ebay>
         <area>OK</area>
         <area>BE</area>
         <area>EV</area>
         <area>WC</area>
      </ebay>
      <sbay>
         <area>SJ</area>
         <area>MP</area>
         <area>SV</area>
         <area>MV</area>
      </sbay>
      <S.F.>
         <area>SF</area>
      </S.F.>
      <N.Y.>
         <area>NY</area>
      </N.Y.>
      <S.CA>
         <area>SD</area>
         <area>LA</area>
      </S.CA>
   </regions>
   <products/>
   <services/>
</branch0>";
";

最后我的控制台输出(你会看到它是 [name: null] 我希望它说 [name: business] 或什至只是 business; 但不是包含子元素数据 w/out 空格):

Test Results:

branch0

name [name: null]
taxINFO [taxINFO: null]
personnel [personnel: null]
    executives:  Billy Bob, Colonel Jessup
    managerial: 
    operations:  sabrina, lisa
    services:    jamie, justin, forest
regions [regions: null]
    ebay:    OK, BE, EV, WC
    sbay:    SJ, MP, SV, MV
    S.F.:    SF
    N.Y.:    NY
    S.CA:    SD, LA
products [products: null]
services [services: null]

这是我使用 .getTextContent:

的控制台输出
Test Results:
business
branch0

name business
taxINFO 
personnel Billy BobColonel Jessupsabrinalisajamiejustinforest
 executives:     Billy Bob, Colonel Jessup
 managerial:    
 operations:     sabrina, lisa
 services:   jamie, justin, forest
regions OKBEEVWCSJMPSVMVSFNYSDLA
 ebay:   OK, BE, EV, WC
 sbay:   SJ, MP, SV, MV
 S.F.:   SF
 N.Y.:   NY
 S.CA:   SD, LA
products 
services 
System.out.println(childElement.getTagName() + " " + childElement);

应该是(如你所知!)

System.out.println(childElement.getTagName() + " "
    + childElement.getTextContent());

因此,出于我的目的,我能够使用 XPath 获得我正在寻找的各个元素:

XPathFactory xpfactory = XPathFactory.newInstance();
XPath path = xpfactory.newXPath();
try {
    String aString = path.evaluate("/branch0/name", doc);
    System.out.println(aString);
    } catch (XPathExpressionException e) { e.printStackTrace(); }

当然,这需要预先了解结构,但由于我可以使用 XML 架构进行验证,而且我的文档没有太 complicated/heavily 嵌套,所以我认为这不会对我来说是个问题。当我完成当前项目的工作后,我将尝试查找 post 关于遍历子节点和检查文本节点的链接(如@Ian Roberts 所建议的),但我对 XML 现在就去做。