Dane: A modified version of Saxon

Trevor Nash

Melvaig Software Engineering Limited

Table of Contents

1. Introduction
Administration
2. Bug Fixes
URIResolver and xsl:include/xsl:import
3. Output URI Resolver
Overview and Justification
URIResolver Interface
OutputURIResolver Interface
Standard Output URI Resolver
Calling the Output Resolver
Java interfaces for supplying the Output URI Resolver
Command Line Interface
Examples
Output URI logger
Java Object Resolver
Example
4. Supplied Components
A. Using Sock With Third-Party Code

Chapter 1. Introduction

Table of Contents

Administration

This document describes Dane, a modified version of the XSLT processor Saxon. It implements facilities required by Sock, and this document itself is an example of how to use Sock to manage changes to third party code. Each chapter that follows describes a particular facility. The appendix covers the detail of how Sock is used for this kind of application.

Why Dane? Well, the Saxon reign in England was briefly interrupted by the Danes. The thinking behind this modified version is that we are anticipating features which will become available in standard Saxon when XSLT 2.0 is implemented. So, we hope that one day Dane will cease to exist and we can resume using standard Saxon. If that proves impractical then we might rename it Norman.

Administration

Here are the files to be modified.

  • src/com/icl/saxon/Version.java
  • src/com/icl/saxon/style/XSLDocument.java
  • src/com/icl/saxon/TransformerFactoryImpl.java
  • src/com/icl/saxon/Controller.java
  • src/com/icl/saxon/StyleSheet.java
  • src/com/icl/saxon/style/XSLGeneralIncorporate.java

Modify the version information by adding a name and version suffix. Versions are simply a letter added to the Saxon version number.

For version:dane-version
a
The name of the product includes a reference to the original Saxon version on which it is based.
For version:dane-name
"Dane " + getVersion() + "<sk:sink id="dane-version"/> based on " + 

Insert a warning about editing the modified files instead of the XML versions.

For :edit-warning
/************************************************************************/
/* File maintained by Sock: DO NOT EDIT - look for the .xml file instead */
/************************************************************************/
Tag on a copyright to say where the changes came from.
For :tail-copyright
// Output URI resolver implementation by
//     Trevor Nash
//     Melvaig Software Engineering Limited
//     www.melvaig.co.uk

Put a copyright / Mozilla licence into al the new Java files.

For Object java:class replacement:java:class|java:interface
<sk:replace match="java:class|java:interface">
<sk:source sink.id="copyright" file.id="{@id}">
//
// The contents of this file are subject to the Mozilla Public License Version 1.0 (the "License");
// you may not use this file except in compliance with the License. You may obtain a copy of the
// License at http://www.mozilla.org/MPL/ 
//
// Software distributed under the License is distributed on an "AS IS" basis,
// WITHOUT WARRANTY OF ANY KIND, either express or implied.
// See the License for the specific language governing rights and limitations under the License. 
//
// The Original Code is: all this file. 
//
// The Initial Developer of the Original Code is
// Trevor Nash of Melvaig Software Engineering Limited (tcn@melvaig.co.uk).
//
// Portions created by (your name) are Copyright (C) (your legal entity). All Rights Reserved. 
//
// Contributor(s): none. 
//
</sk:source>

</sk:replace>

Chapter 2. Bug Fixes

URIResolver and xsl:include/xsl:import

Status: reported against 6.5. Partial fix in 6.5.1, but fails with a null pointer exception, reported.

Description: if a user-specified URIResolver returns null for a URI from an xsl:include or xsl:import, Saxon fails instead of falling back to the standard URI resolver.

The fix we applied to 6.5 involved changing TransformerFactoryImpl to be responsible for managing the StandardURIResolver object rather than Controller. Saxon 6.5.1 changes the export status of the StandardURIResolver, but we will stay with the original plan for the OutputURIResolver.

Fix the omission in the Saxon 6.5.1 correction.

For xsl-inc:fallback-bug
                source =

Chapter 3. Output URI Resolver

Overview and Justification

XSLT 1.0 supports only a single output document. On the command line, one may specify a file name. JAXP allows the destination and format to be determined by supplying an appropriate subclass of Result.

XSLT 1.1 introduced multiple output documents by the xsl:document element. Many processors, including Saxon and Xalan, already provided this facility by proprietary extensions. It is likely that something like xsl:document will exist in XSLT 2.0.

Multiple input documents are already available in XSLT 1.0. To allow the user some control over how URIs are interpreted, a subclass of URIResolver may be supplied.

This modification introduces a similar interface OutputURIResolver which can be used to interpret URIs specified in the xsl:document element. The facility consists of

  • A new interface OutputURIResolver

  • A default implementation of OutputURIResolver which preserves Saxon's current behaviour.

  • Code in Saxon's outputter to call it

  • A way of supplying an instance of OutputURIResolver when using JAXP. We do not modify JAXP itself: the user is required to cast to an Dane specific class to access the necessary method. But we do try to guess what the JAXP implementation will look like.

  • A way of supplying an instance of OutputURIResolver from the command line.

A variation on this design could be to add another feature to the existing URIResolver interface rather than invent a separate interface. Besides breaking existing code this would make it impossible to provide different resolvers for input and output. Conversly, the proposed design does not prevent one class implementing both interfaces, and so the same object may be used for both resolvers, but does not have to be.

URIResolver Interface

For reference, here is the existing URI resolver interface for input documents. It is described here because we use it later to write a class which implements both URI resolver interfaces. The method signature goes into any other class that has the element u:resolve.

For Object u:resolve replacement:
<sk:replace>
    /**
    * Called by the processor when it encounters an xsl:include, xsl:import,
    * or document() function.
    * @param href An href attribute, which may be relative or absolute.
    * @params base The base URI in effect when the href attribute was encountered.
    * @return A Source object, or null if the href cannot be resolved, and the
    * processor should try to resolve the URI itself.
    */
    public Source resolve (
            String href,
            String base)
        throws TransformerException</sk:replace>

OutputURIResolver Interface

This interface or something like it should eventually be part of JAXP, but for now it lives in the Saxon package. It is based on URIResolver using the method name resolveNew rather than resolve so that both interfaces may be implemented in one class without overloading.

<java:interface id="o-r" name="OutputURIResolver" package="com.icl.saxon"/>
For o-r:classQualifiers
public
For o-r:classHeader
/**
 * <p>An object that implements this interface that can be called by the
 * processor to turn a URI used in xsl:document into a Result object.</p>
 */
The method signature goes into the standard implementation of this interface, and any other class that has the element u:resolveNew.
For Object u:resolveNew replacement:
<sk:replace>
    /**
     * Called by the processor when it encounters
     * an xsl:document element.
     *
     * @param href An href attribute, which may be relative or absolute.
     * @param base The base URI in effect when the href attribute
     * was encountered.
     * @param attributes The attributes of the xsl:document element, after
     * AVT expansion.
     *
     * @return A Result object, or null if the href cannot be resolved,
     * and the processor should try to resolve the URI itself.
     *
     * @throws TransformerException if an error occurs when trying to
     * resolve the URI.
     */
    public Result resolveNew (
            String href,
            String base,
            Properties attributes)
        throws TransformerException</sk:replace>
For o-r:methods
    <u:resolveNew/>;

Standard Output URI Resolver

This is based on the original code from com.icl.saxon.style.XSLDocument. It lives in the same package as StandardURIResolver.

<java:class id="s-o-r" name="StandardOutputURIResolver" package="com.icl.saxon">
</java:class>
<java:implements id="s-o-r" class.id="o-r"/>

The resolve method body is derived from code lifted from the old XSLDocument class.

For s-o-r:methods
    <u:resolveNew/>
    {
        Result result = null;
        <sk:copy-marked-section marked-section.id="file_create_1" file.id="xsl-doc"/>
        String outFileName = href;
        <java:import names="File String FileOutputStream StreamResult IOException TransformerException"/>
        <sk:copy-marked-section marked-section.id="file_create_2" file.id="xsl-doc"/>
        return result;
    }

The constructor follows the pattern for StandardURIResolver, though we don't need the parameter.

For s-o-r:constructors
    protected StandardOutputURIResolver() {
    }    

Calling the Output Resolver

At present we only know how to deal with absolute URIs, so the base URI is passed as null. When XSLT 2.0 sorts itself out we should use the URI of the main output file or possibly an enclosing xsl:document. For now relative URIs are relative to the current directory.

For xsl-doc:resolve-output-uri
        [o-r] r = c.getOutputURIResolver();
        result = r.resolveNew(outFileName, null, details);
        
        // if a user URI resolver returns null, try the standard one
        // (Note, the standard URI resolver never returns null)
        if (result==null) {
            r = c.getStandardOutputURIResolver();
            result = r.resolveNew(outFileName, null, details);
        }

If the Result object is a stream we close it at the end of the document. This replicates the behaviour prior to providing OutputURIResolvers, and should usually be helpful as it is difficult for the user to arrange to call close() themselves, but might give problems if someone does something odd such as using one stream for several documents.

For xsl-doc:close-output
        if (result instanceof StreamResult) {
            StreamResult streamResult = (StreamResult)result;
            OutputStream stream = streamResult.getOutputStream();
            if (stream != null) {
                <sk:copy-marked-section marked-section.id="file_close" file.id="xsl-doc"/>
            } else {
                Writer writer = streamResult.getWriter();
                if (writer != null) {
                   try {
                       writer.close();
                   } catch (java.io.IOException err) {
                       throw new TransformerException("Failed to close output file", err);
                   }     
                }
            }
        }

Java interfaces for supplying the Output URI Resolver

We have assumed that the JAXP designers are going to be sensible. The Saxon implementation of the JAXP interfaces have getters and setters for the ouput resolver in the same places as those for the input resolver. Until JAXP is updated it will be necessary for callers to cast the generic JAXP interfaces to the Saxon specific implementations in order to access these new methods. I decided not to allow the output URI resolver to be set via the setFeature method because that would be inconsistent with the view that this should be a standard facility.

First the Transformer Factory implementation. [Note: no import statements are generated by this code as there is no import sink. We rely on the blanket imports already present.]

For factory:private-members
	private [o-r] outputResolver = new [s-o-r]();
	private [o-r] standardOutputResolver = new [s-o-r]();
For factory:get-set-methods
    /**
     * Set an object that is used during the transformation
     * to resolve URIs used in xsl:document.
     *
     * @param resolver An object that implements the OutputURIResolver interface,
     * or null.
     */
     
    public void setOutputURIResolver([o-r] outputResolver) {
    	this.outputResolver = outputResolver;
    }

    /**
     * Get the object that is used during the transformation
     * to resolve URIs used in xsl:document.
     *
     * @return The [o-r] that was set with setOutputURIResolver.
     */
     
    public [o-r] getOutputURIResolver() {
    	return outputResolver;
    }

    /**
     * Get the object that is used by default during the transformation
     * to resolve URIs used in xsl:document.
     *
     * @return An instance of StandardOutputURIResolver.
     */
     
    public OutputURIResolver getStandardOutputURIResolver() {
    	return standardOutputResolver;
    }

Similar changes to the Controller class.

For controller:private-members
    private [o-r] standardOutputURIResolver;
    private [o-r] userOutputURIResolver;
For controller:initialise-resolvers
        standardOutputURIResolver = factory.getStandardOutputURIResolver();
        userOutputURIResolver = factory.getOutputURIResolver();
For controller:get-methods
    /**
    * Get the primary output URI resolver.
    * @return the user-supplied URI resolver if there is one, or the system-defined one
    * otherwise (Note, this isn't quite as JAXP specifies it).
    */

    public [o-r] getOutputURIResolver() {
        return (userOutputURIResolver==null ?
                standardOutputURIResolver : userOutputURIResolver);
    }

    /**
    * Get the fallback output URI resolver.
    * @return the the system-defined output URIResolver
    */

    public [o-r] getStandardOutputURIResolver() {
        return standardOutputURIResolver;
    }

For controller:set-methods
    /**
    * Set an object that will be used to resolve URIs used in 
    * xsl:document.
    * @param resolver An object that implements the OutputURIResolver interface, 
    * or null.
    */
    
    public void setOutputURIResolver([o-r] resolver) {
        userOutputURIResolver = resolver;
    }
  

Command Line Interface

Saxon accepts an option "-r" to specify the resolver used for input documents: we add an option "-ro" to specify the resolver for output documents.

For main:process-option
                    else if (args[i].equals("-ro")) {
                        i++;
                        if (args.length < i+2) badUsage(name, "No Output URIResolver class");
                        String r = args[i++];
                        factory.setOutputURIResolver(makeOutputURIResolver(r));
                    }
For main:help-message
        System.err.println("  -ro classname   Use specified OutputURIResolver class ");
For main:make-resolver

    public static [o-r] makeOutputURIResolver (String className) 
    throws TransformerException
    {
        Object obj = Loader.getInstance(className);
        if (obj instanceof [o-r]) {
            return ([o-r])obj;
        }
        throw new TransformerException("Class " + className + " is not a [o-r]");
    }

Examples

Output URI logger

This class can be used to prove that Dane is selecting the right output resolver. It may also be useful to make sure the right URI is being generated in the stylesheet if an AVT is used for the href attribute.

<java:class id="logger" name="OutputURILogger" package="com.icl.saxon.resolvers">
   <java:implements class.id="o-r"/>
</java:class>
For logger:classQualifiers
public
For logger:methods
    public OutputURILogger () {}

    <u:resolveNew/>
    {
        System.out.println("Output URI resolved: "+href);
        try {
            attributes.store (System.out, "attributes:");
        } catch (IOException err) {
            throw new TransformerException("Failed to write attributes", err);
        }
        return null;
    }

Java Object Resolver

This resolver maps URIs to Java objects. It implements both the URIResolver and OutputURIResolver interfaces, so can handle both input and output documents. The resolver intercepts URIs which either begin with the prefix "internal:" or have been notified in advance. If an output document with a URI beginning "internal:" is produced, then this resolver will create a DOMResult object for it. If the same URI is then used for input, the DOMResult will be converted to a DOMSource. So, one application of this resolver is to pass documents from one transform to the next without serializing them. Of course, it is a mistake to try to write to a document and then read it back in within the same transform, because output documents are not created until the transform is complete. The resolver only looks at the URI argument, it ignores the base URI - so relative URIs in input documents may not work as expected. This is because the current version of Saxon does not use the base URI on xsl:document: the href is taken as a file name.

<java:class id="internal" name="InternalURIResolver" package="com.icl.saxon.resolvers">
</java:class>
For internal:classImplements
[o-r], URIResolver
For internal:classQualifiers
public

Trace messages may be logged by calling setTrece(true).

For internal:methods
    private boolean trace = false;
    public void setTrace (boolean value) {
        trace = value;
    }
For Object u:trace replacement:
<sk:replace>if (trace) System.out.println(<xsl:value-of select="."/>);
</sk:replace>

The resolver contains a hash table to map URIs to Java objects.

For internal:methods
    private Hashtable map = new Hashtable();
Access is permitted to this so that the user can iterate over the whole set of stored documents.
For internal:methods
    public Map getMap() {
        return map;
    }

The user may add and remove Source and Result objects from this map via type-safe interfaces.

For internal:methods
    /**
    * Associate a Source object with a given URI.  The document supplied
    * may be read with xsl:include, xsl:import or the document() function.
    */
    public void putSource(
        String uri,
        Source value) {
        map.put(uri, value);
    }

    /**
    * Retrieve a Source object corresponding to a given URI.  This is
    * used to retrieve a value written with putSource.
    */
    public Source getSource(
        String uri) {
        Object result = map.get(uri);
        if (result instanceof Source) {
            return (Source)result;
        }
        return null;
    }

    /**
    * Associate a Result object with a given URI.  The document supplied
    * may be written with xsl:document.
    */
    public void putResult(
        String uri,
        Result value) {
        map.put(uri, value);
    }

    /**
    * Create a Result object with a given URI.  The document supplied
    * may be written with xsl:document and read in a subsequent
    * transformation with xsl:include, xsl:import or the document() function.
    */
    public void putResult(String uri) {
        DOMResult dom = new DOMResult();
        dom.setSystemId(uri);
        map.put(uri, dom);
    }

    /**
    * Retrieve a Result object corresponding to a given URI.  This is
    * used to access a document written with xsl:document.
    */
    public Result getResult(
        String uri) {
        Object result = map.get(uri);
        if (result instanceof Result) {
            return (Result)result;
        }
        return null;
    }

    /**
    * Retrieve a Result object corresponding to a given URI and
    * convert it to a Source.  This is
    * used to access a document written with xsl:document for
    * processing in another transformation.  This only works for
    * a DOMResult, as other types of result cannot be converted.
    */
    public Source getResultAsSource(
        String uri) {
        Object result = map.get(uri);
        if (result instanceof DOMResult) {
            DOMResult dom = (DOMResult) result;
            return new DOMSource(dom.getNode(), dom.getSystemId());
        }
        return null;
    }

When resolving an output URI, we first look to see if there is a Result object registered. If there is, we use it otherwise we create a new one only if the URI begins with "internal:". A created result is always a DOMResult.

For internal:methods
    public InternalURIResolver () {}

    <u:resolveNew/>
    {
        // If there is a previously registered result with this URI, use that.
        Result result = getResult(href);
        if (result != null) {
            <u:trace>"Output URI resolved: "+href</u:trace>
            return result;
        }
        // If the URI is not marked 'internal:' then let the standard resolver
        // handle it.
        if (!href.startsWith("internal:")) {
            return null;
        }
        // For 'internal:' URIs create a DOMResult (which can subsequently be
        // converted to a Source).
        DOMResult dom = new DOMResult();
        dom.setSystemId (href);
        map.put(href, dom);
        <u:trace>"Output URI created: "+href</u:trace>
        return dom;
    }

When resolving an input URI, we look to see if there is a Source object registered. If there is, we use it otherwise we try to find a Result object which we can convert to a Source. If that fails we return null.

For internal:methods
    <u:resolve/>
    {
        // First try for a Source object.
        Source result = getSource(href);
        if (result != null) {
            <u:trace>"URI resolved: "+href</u:trace>
            return result;
        }
        // Then a Result object which can be converted.
        result = getResultAsSource(href);
        if (result != null) {
            <u:trace>"URI resolved from output: "+href</u:trace>
            return result;
        }
        // That failing, give up and let the standard resolver deal with it.
        <u:trace>"URI not resolved: "+href</u:trace>
        return null;
    }

Example

This is an example of how the InternalURIResolver might be used. It is a simple command-line JAXP application which executes two transforms; the first transform produces several documents which are read by the second transform.

<java:class id="test-internal" name="TestInternalURIResolver" src="./tests/internal"/>
For test-internal:classQualifiers
public
For test-internal:methods

    public static void main (String[] args)
    throws TransformerException, FileNotFoundException {
        <sk:sink id="prime-resolver"/>
        <sk:sink id="setup-factory"/>
        <sk:sink id="transforms"/>
    }

Create an instance of the resolver.

For test-internal:prime-resolver
        [internal] resolver = new [internal]();
        resolver.putResult("d3");  // see t1.xsl and t2.xsl

Obtain an instance of TransformerFactory. Cast it to the Saxon-specific class so that we get access to the OutputURIResolver methods. In theory, once JAXP is updated we just remove the cast.

For test-internal:setup-factory
        TransformerFactory factory = TransformerFactory.newInstance();
        factory.setURIResolver (resolver);
        ((TransformerFactoryImpl)factory).setOutputURIResolver (resolver);

For Object u:file-as-source replacement:
<sk:replace>new StreamSource(new FileInputStream(<xsl:value-of select="."/>))</sk:replace>
For Object u:console-as-result replacement:
<sk:replace>new StreamResult (System.out)</sk:replace>
For test-internal:transforms
        Transformer t1 = factory.newTransformer(
                <u:file-as-source>"t1.xsl"</u:file-as-source>);
        t1.transform(
            <u:file-as-source>"input.xml"</u:file-as-source>,
            <u:console-as-result/>);
        Transformer t2 = factory.newTransformer(
                <u:file-as-source>"t2.xsl"</u:file-as-source>);
        t2.transform(
            <u:file-as-source>"input.xml"</u:file-as-source>,
            <u:console-as-result/>);

Chapter 4. Supplied Components

This specification imports http://local/kits/java.

Appendix A. Using Sock With Third-Party Code

One of the benefits of using open source software is that it is possible to correct bugs or add facilities without having to rely on the original producer to release a patch or a new version. But there is a serious objection to this: unpredictable costs are involved in merging such changes with any future version of the product.

Sock can be used to control these costs. This is how:

  • It enables detailed documentation to be kept about what was changed and why, without introducing yet more text into the code.

  • Inserted code is kept outside the original source file.

  • Code to be moved elsewhere is left in its original position.

The underlying principle is to keep the changes to the original code to an absolute minimum. Instead of changing the code directly we insert tags to define where changes are to be made, then keep the changes externally.

Here is the procedure, which is repeated for each modified source file.

  • Make a copy of the file to be changed.

  • Convert it to XML by adding an XML declaration (take care to choose the correct encoding if the file contains any exotic characters). Wrap the whole thing in an sk:text-file element, using CDATA to avoid having to escape characters such as less-than and ampersand.

  • Where code is to be inserted, use an sk:sink element.

  • Where code is to be deleted or used elsewhere, put it inside a sk:marked-section element with a suitably chosen id attribute.

  • Include this as an sk:file element within an Sock specification which contains the actual modified code.

In most cases it should be possible to construct a modified file which differs from the original only by having occasional lines inserted. This gives us the best chance of obtaining meaningful results from a subsequent difference analysis against a future version of the same file. The analysis will not be compromised by any comentary we include, because this is not in the modified file - it is in the Sock specification.