Understanding StAX: how to correctly use XMLStreamWriter
Note: This is a slightly edited version of a text that I wrote for the Axiom documentation. Some of the content is based on a reply posted by Tatu Saloranta on the Axiom mailing list. Tatu is the main developer of the Woodstox project.
Semantics of the
The meaning and precise semantics of the
setDefaultNamespace methods defined by
probably one of the most obscure aspects of the StAX specifications. As we will see later, even the people who wrote the
first version of IBM’s StAX parser (called XLXP-J) failed to implement these two methods correctly. In order to
understand how these method are supposed to work, it is necessary to look at different parts of the specification (For
simplicity we will concentrate on
The Javadoc of the
The table shown in the Javadoc of the
XMLStreamWriterclass in Java 6.
Section 5.2.2, “Binding Prefixes” of the StAX specification.
The example shown in section 5.3.2, “XMLStreamWriter” of the StAX specification.
In addition, it is important to note the following facts:
The terms defaulting prefixes used in section 5.2.2 of the specification and namespace repairing used in the Javadocs of
The methods writing namespace qualified information items, i.e.
writeAttributeall come in two variants: one that takes a namespace URI and a prefix as arguments and one that only takes a namespace URI, but no prefix.
The purpose of the
setPrefix method is simply to define the prefixes that will be used by the variants of the
writeAttribute methods that only take a namespace URI (and the local
name). This becomes clear by looking at the table in the
XMLStreamWriter Javadoc. Note that a call to
doesn’t cause any output and it is still necessary to use
writeNamespace to actually write the namespace declarations.
Otherwise the produced document will not be well formed with respect to namespaces.</p><p>The Javadoc of the
method also clearly defines the scope of the prefix bindings defined using that method: a prefix bound using
remains valid till the invocation of
writeEndElement corresponding to the last invocation of
While not explicitly mentioned in the specifications, it is clear that a prefix binding may be masked by another binding
for the same prefix defined in a nested element. (Interestingly enough, BEA’s reference implementation didn’t get this
aspect entirely right.)
An aspect that may cause confusion is the fact that in the example shown in section 5.3.2 of the specifications, the
setDefaultNamespace) all appear immediately before a call to
writeEmptyElement. This may lead people to incorrectly believe that a prefix binding defined using
to the next element written. This interpretation however is clearly in contradiction with the
Note that early versions of IBM’s XLXP-J were based on this incorrect interpretation of the specifications, but this has
been corrected. Versions conforming to the specifications support a special property called
javax.xml.stream.XMLStreamWriter.isSetPrefixBeforeStartElement, which always returns
Boolean.FALSE. This allows to
easily distinguish the non conforming versions from the newer versions. Note that in contrast to what the usage of the
javax.xml.stream prefix suggests, this is a vendor specific property that is not supported by other
To avoid unexpected results and keep the code maintainable, it is in general advisable to keep the calls to
writeNamespace aligned, i.e. to make sure that the scope (in
XMLStreamWriter) of the prefix binding defined by
setPrefix is compatible with the scope (in the produced document) of the namespace declaration written by the
corresponding call to
writeNamespace. This makes it necessary to write code like this:
writer.writeStartElement("p", "element1", "urn:ns1"); writer.setPrefix("p", "urn:ns1"); writer.writeNamespace("p", "urn:ns1");
As can be seen from this code snippet, keeping the two scopes in sync makes it necessary to use the
variant which takes an explicit prefix. Note that this somewhat conflicts with the purpose of the
one may consider this as a flaw in the design of the StAX API.
XMLStreamWriter usage patterns
Drawing the conclusions from the previous section and taking into account that
XMLStreamWriter also has a “namespace
repairing” mode, one can see that there are in fact three different ways to use
XMLStreamWriter. These usage patterns
correspond to the three bullets in section 5.2.2 of the StAX specification:
In the “namespace repairing” mode (enabled by the
javax.xml.stream.isRepairingNamespacesproperty), the writer takes care of all namespace bindings and declarations, with minimal help from the calling code. This will always produce output that is well-formed with respect to namespaces. On the other hand, this adds some overhead and the result may depend on the particular StAX implementation (though the result produced by different implementations will be equivalent).
In repairing mode the calling code should avoid writing namespaces explicitly and leave that job to the writer. There is also no need to call
setPrefix, except to suggest a preferred prefix for a namespace URI. All variants of
writeAttributemay be used in this mode, but the implementation can choose whatever prefix mapping it wants, as long as the output results in proper URI mapping for elements and attributes.
Only use the variants of the writer methods that take an explicit prefix together with the namespace URI. In this usage pattern,
setPrefixis not used at all and it is the responsibility of the calling code to keep track of prefix bindings.
Note that this approach is difficult to implement when different parts of the output document will be produced by different components (or even different libraries). Indeed, when passing the
XMLStreamWriterfrom one method or component to the other, it will also be necessary to pass additional information about the prefix mappings in scope at that moment, unless the it is acceptable to let the called method write (potentially redundant) namespace declarations for all namespaces it uses.
setPrefixto keep track of prefix bindings and make sure that the bindings are in sync with the namespace declarations that have been written, i.e. always use
setPrefiximmediately before or immediately after each call to
writeNamespace. Note that the code is still free to use all variants of
writeAttribute; it only needs to make sure that the usage it makes of these methods is consistent with the prefix bindings in scope.
The advantage of this approach is that it allows to write modular code: when a method receives an
XMLStreamWriterobject (to write part of the document), it can use the namespace context of that writer (i.e.
getNamespaceContext) to determine which namespace declarations are currently in scope in the output document and to avoid redundant or conflicting namespace declarations. Note that in order to do so, such code will have to check for an existing prefix binding before starting to use a namespace.