Parsing external DTDs with Spring 4.x Jaxb2Marshaller

Having recently upgraded a fairly sizeable Spring project to Spring 4.1.7, I uncovered an issue in which, after the upgrade, a class that talks to an external XML API was failing with the following stack trace:

org.springframework.http.converter.HttpMessageNotReadableException: Could not read [class com.richpollock.blog.ExampleClass];
nested exception is org.springframework.oxm.UnmarshallingFailureException: JAXB unmarshalling exception;
nested exception is javax.xml.bind.UnmarshalException
 - with linked exception:
	[org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 10; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.]
	at org.springframework.http.converter.xml.MarshallingHttpMessageConverter.readFromSource(MarshallingHttpMessageConverter.java:134)
	at org.springframework.http.converter.xml.AbstractXmlHttpMessageConverter.readInternal(AbstractXmlHttpMessageConverter.java:61)
	...

As with most exceptions thrown by large libraries such as Spring, there’s an underlying exception that’s been thrown, wrapped and rethrown. And, as with most exceptions thrown by JAXB, there are also a lot of linked exceptions, which in this case originate from a SAXParseException thrown by Xerces (a JSR 206-compliant, fully-conforming XML Schema 1.0 processor).

In this instance, the error is thrown by the PrologDispatcher (nĂ©e PrologDriver), a nested class that forms part of Xerces’ XMLDocumentScannerImpl class. That the exception is being thrown in the XML prolog shows that the failure occurs before the start tag of the XML document is reached. Specifically, the following line in PrologDispatcher is responsible for throwing the exception:

switch(fScannerState){
...
case SCANNER_STATE_DOCTYPE: {
	if (fDisallowDoctype) {
		reportFatalError("DoctypeNotAllowed", null);
	}
...

The difficulty is that the code is buried way down in the inner workings of the XML parser, which in this case, wasn’t instantiated by us in the first place. Indeed, the last code that was under our direct control was a call to the getForObject() method on an autowired RestTemplate instance. Regardless, the fDisallowDoctype check in PrologDispatcher is reminiscent of the problem reported in the initial stack trace: “DOCTYPE is disallowed when the feature “http://apache.org/xml/features/disallow-doctype-decl” set to true.”

As the name suggests, the disallow-doctype-decl feature prevents an XML document from being parsed if it contains a document type definition (DTD; specified using the DOCTYPE declaration in the XML). Along with the related FEATURE_SECURE_PROCESSING option, this can prevent both XML eXternal Entity (XXE) attacks, which can expose local file content, and XML Entity Expansion (XEE) attacks, which can result in denial of service. As such, the disallow-doctype-decl feature shouldn’t be disabled without giving due consideration to the security implications.

With that said, a bit of searching around reveals a few options for how the disallow-doctype-decl feature can be configured, but they depend on having direct access to the SAXParserFactory instance, setting and unsetting System properties, setting a JRE-wide jaxp.properties file, or passing command-line flags to the JVM. None of these are particularly desirable (or easily achievable).

So the next step was to identify the calls between the Spring RestTemplate (over which we have direct control in code) and the XML parsing code that’s throwing the exception. Thankfully, in this instance, we have control of the RestTemplate bean configuration in the application context as follows:

<bean id="restTemplate" class="org.springframework.web.client.RestTemplate">
    <constructor-arg ref="httpClientFactory"/>

    <property name="messageConverters"> 
        <list> 
            <bean class="org.springframework.http.converter.xml.MarshallingHttpMessageConverter"> 
                <property name="marshaller" ref="jaxbMarshaller"/> 
                <property name="unmarshaller" ref="jaxbMarshaller"/> 
            </bean> 
        </list> 
    </property>
</bean>

The jaxbMarshaller bean (as referenced above) is also configured in the application context as an instance of Jaxb2Marshaller:

<bean id="jaxbMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">

    <property name="classesToBeBound"> 
        <list> 
         ...    
        </list> 
    </property>

</bean>

Having initially tried to set the “http://apache.org/xml/features/disallow-doctype-decl” option to “false” through the setMarshallerProperties() method on Jaxb2Marshaller, I subsequently noticed the setSupportDtd property, which “Indicates whether DTD parsing should be supported.” This resolves the issue and, ultimately, the fix comes down to configuring the Jaxb2Marshaller bean with the following option:

<property name="supportDtd" value="true" />

Note that supportDtd can also be set to true by setting the Jaxb2Marshaller processExternalEntities property to true; the difference being that the latter both allows parsing of XML with a DOCTYPE declaration and processing of external entities referenced from the XML document. Jaxb2Marshaller ultimately uses the (logical complement of the) supportDtd option to set the “http://apache.org/xml/features/disallow-doctype-decl” feature on whichever XMLReader implementation is returned from XMLReaderFactory. By default, this is the class that the JVM-instance-wide org.xml.sax.driver property is set to, the class specified in META-INF/services/org.xml.sax.driver or, as a fallback, com.sun.org.apache.xerces.internal.parsers.SAXParser.

For security reasons, it would make sense to configure 2 Jaxb2Marshaller instances, one with DOCTYPE support enabled and one without, using the instance with DOCTYPE support only where it’s absolutely necessary and the XML source is trusted.

4 thoughts on “Parsing external DTDs with Spring 4.x Jaxb2Marshaller

  1. For anyone else who comes across this same obscure problem, I wanted to share the approach I used, after seeing Rich’s excellent analysis.

    I wanted to be able to validate the incoming XML against the vendor-supplied DTD, but I didn’t want to have any dependencies on external DTDs, both of security and reliability reasons. It took some fiddling to find the right way to do this, and this is what it looks like.

    This gave me a couple things that I needed:

    1. Ability to process the old-fashioned XML-with-DTD that the external vendor requires.
    2. Ability to validate against the DTD but with no dependency on an external http retrieval of such.
    3. Ability to return the parsing exception message (if any) to the caller in the response body along with a 400 response code.

    This is all inside a greenfield SpringMVC app, but the auto-marshalling done by that system didn’t give me enough control and ability to return the exact results that I wanted.

    (the indents got a little mangled but I’m too lazy to clean it up)


    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    domFactory.setValidating(true);
    domFactory.setNamespaceAware(false);
    DocumentBuilder builder;
    try {
    domFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
    domFactory.setFeature(javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING, true);
    builder = domFactory.newDocumentBuilder();
    } catch (ParserConfigurationException e) {
    throw new RuntimeException("The XML parser does not support the features we are setting. This should be investigated.", e);
    }
    builder.setErrorHandler(new BubbleUpExceptionsErrorHandler());
    builder.setEntityResolver(new LocalDTDEntityResolver("local file path to your DTD"));
    // any parsing errors throw SAXException, which are handled by the caller
    return builder.parse(inputStream);

    protected static class LocalDTDEntityResolver implements EntityResolver {

    private String dtdFullPath;
    private File dtdFile;

    public LocalDTDEntityResolver(String dtdFullPath) {
    File file = new File(dtdFullPath);
    if (!file.exists()) {
    throw new IllegalArgumentException("DTD path passed in does not exist [" + dtdFullPath + "].");
    }
    if (!file.isFile()) {
    throw new IllegalArgumentException("DTD path passed in is not a file [" + dtdFullPath + "].");
    }
    this.dtdFullPath = dtdFullPath;
    this.dtdFile = new File(this.dtdFullPath);
    }

    @Override
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
    return new InputSource(new FileInputStream(dtdFile));
    }
    }

    protected static class BubbleUpExceptionsErrorHandler implements ErrorHandler {
    @Override
    public void error(SAXParseException e) throws SAXException {
    throw e;
    }
    @Override
    public void fatalError(SAXParseException e) throws SAXException {
    throw e;
    }
    @Override
    public void warning(SAXParseException e) throws SAXException {
    throw e;
    }
    }

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *