Parsing XML Documents in Java

By Paulus, 4 September, 2010

I've decided to write an application in Java to manage my Rackspace Cloud Servers. To send requests and recieve responses, I opted to use XML. Parsing XML took a little getting used to because I didn't fully understand how Nodes/Elements and Attributes worked.

The following is the limits response:

 
<limits xmlns="http://docs.rackspacecloud.com/servers/api/v1.0">
	<rate>
		<limit verb="POST" URI="*" regex=".*" value="10" remaining="2" unit="MINUTE" resetTime="1244425439" />
		<limit verb="POST" URI="*/servers" regex="^/servers" value="50" remaining="49" unit="DAY" resetTime="1244511839" />
		<limit verb="PUT" URI="*" regex=".*" value="10" remaining="2" unit="MINUTE" resetTime="1244425439" />
		<limit verb="GET" URI="*changes-since*" regex="changes-since" value="3" remaining="3" unit="MINUTE" resetTime="1244425439" />
		<limit verb="DELETE" URI="*" regex=".*" value="100" remaining="100" unit="MINUTE" resetTime="1244425439" />
	</rate>
	<absolute>
		<limit name="maxTotalRAMSize" value="51200" />
		<limit name="maxIPGroups" value="25" />
		<limit name="maxIPGroupMembers" value="25" />
	</absolute>
</limits>

I thought that limits, rate, absolute, and limit were all Nodes or Elements, which is true because Element inherits from Node. However, getting the attributes (e.g. verb, uri, etc) was a little different. I was trying to work with the XML file in this way:

 

limits
|----- rate
|        |----- limit (attributes: verb, URI, regex, etc)
|        |----- limit (attributes: verb, URI, regex, etc)
|        |----- limit (attributes: verb, URI, regex, etc)
|        |----- limit (attributes: verb, URI, regex, etc)
|----- absolute
|        |----- limit (attributes: name, value)
|        |----- limit (attributes: name, value)

 

Again, not completely wrong as I could have used the getAttribute(String) function if I knew the name of the Attribute. I wanted to get them all with out having to explicity retreive each attribute. So I had to think of the file in this way:

 

limits
|----- rate
|        |----- limit
|        |        |----- verb
|        |        |----- URI
|        |        |----- regex
|        |        |----- value
|        |----- limit
|        |        |----- verb
|        |        |----- URI
|        |        |----- regex
|        |        |----- value
|        |----- limit
|        |        |----- verb
|        |        |----- URI
|        |        |----- regex
|        |        |----- value
|        |----- limit
|        |        |----- verb
|        |        |----- URI
|        |        |----- regex
|        |        |----- value
|----- absolute
|        |----- limit
|        |        |----- name
|        |        |----- value
|        |----- limit
|        |        |----- name
|        |        |----- value

 

What are Nodes and Elements

According to the DOM Level 3 Specification, a Node is a representation of a HTML or XML Elements. Which in the document above, would be limits, limit, rate, and absolute. You can retreive the name of the node and it's value by using the getNodeName() and getNodeValue() functions, respectfully. When you retrieve a Node's value, you will be retrieving the content between a beginning and ending tag. For example, calling the getNodeValue() on a Node object that is "<speedlimit>25 mph</speedlimit>" will return "25 mph". This will not return it's attributes, if the node has them.

Elements have Node as it's superclass, but the an Element object allows you more flexibilty with how you work with other Elements and Attributes. With an Element object you can get Elements by name (e.g. limit) check to see if an Element has attributes, and set attribute values.

In short, a Node is a basic high level definition of what to provide. Think of a Node as a Person, and an Element like a Java programmer and an Attr as a C++ programmer. Java and C++ programmers a both people, do the same thing, but how they do it is a little bit different.

 

Iterating through the XML File

What I'm trying to do in the following code is parse the response and output it to the console.

protected static void parseLimits(Document xml_dom) {
     Element rootEl = xml_dom.getDocumentElement();
     NodeList limits;

     // Get the child nodes, absolute and rate.
     limits = rootEl.getChildNodes();

     // Iterate through absolute and rate.
     for (int i = 0; i < limits.getLength(); i++) {

         if (limits.item(i).getNodeType() == Node.ELEMENT_NODE) {

              // Get the child nodes, <limit>
	      NodeList currentLimits = limits.item(i).getChildNodes();

	      System.out.println("******" + limits.item(i).getNodeName() + "******");

              // Iterate through all the limits in each type of limit. (absolute and rate)
              for (int j = 0; j < currentLimits.getLength(); j++) {

                   // Limit node
                   Node nodeLimit = (Node) currentLimits.item(j);

                   // If the node we're looking at is an element then continue.
                   if (nodeLimit.getNodeType() == Node.ELEMENT_NODE) {

                        // Get the attributes of the limit, eg verb, uri, regex, etc
                        NamedNodeMap nmap = nodeLimit.getAttributes();

                        // Iterate through all the child nodes.
                        for (int k = 0; k < nmap.getLength(); k++) {

                             // If the node we're looking at is an attribute then continue
                             if (nmap.item(k).getNodeType() == Node.ATTRIBUTE_NODE) {
                                  System.out.println(nmap.item(k).getNodeName() + ": " + nmap.item(k).getNodeValue());
		             }
                        }
                   }
              }
         }
      }
}

The above code is pretty well documented. When iterating through the NamedNodeMap object, we're looking for Nodes that are ELEMENT_NODE. There are other types of Nodes that are returned when iterating through the NamedNodeMap, the other is TEXT_NODE which is a textual representation of an Element, such as limit. In this example, we don't want anything to do with those types of nodes.

Once we've gotten to a point where we have an Element that has attributes we have to retrieve all the attributes which create another list that we have to iterate through.