Jesper Tverskov, December 23, 2008, 2. edition

Creating XHTML with XQuery

Creating XHTML documents with XQuery is easy when we know how. There are two challenges: how to set the serialization parameters for the proper XHTML DOCTYPE, and how to refer input XML when output is in the XHTML default namespace.

1. Serialization in XQuery

XQuery and XSLT share the same W3C Recommendation for serialization (creation of files): XSLT 2.0 and XQuery 1.0 Serialization. There are a number of serialization parameters that influence how serialization is performed. In XSLT the serialization parameters turn up as attributes in the xsl:output element and in the xsl:result-document but the parameters are not available in a standardized way in XQuery. [1]

Serialization parameters

method = "xml" | "html" | "xhtml+xml" | "text" | qname-but-not-ncname
byte-order-mark = "yes" | "no"
cdata-section-elements = qnames
doctype-public = string
doctype-system = string
encoding = string
escape-uri-attributes = "yes" | "no"
include-content-type = "yes" | "no"
indent = "yes" | "no"
media-type = string
normalization-form = "NFC" | "NFD" | "NFKC" | "NFKD" | "fully-normalized" | "none" | nmtoken
omit-xml-declaration = "yes" | "no"
standalone = "yes" | "no" | "omit"
undeclare-prefixes = "yes" | "no"
use-character-maps = qnames
version = nmtoken

It is important to understand that serialization is an optional feature both in XSLT and in XQuery. In the words of the spec: "Host languages MAY allow users to specify any or all of these parameters, but they are not REQUIRED to be able to do so."

XQuery does not have a standard way of setting the serialization parameters if available. In XQuery we must look up the proper documentation for the XQuery processor to find out what serialization parameters are implemented if any, and how exactly to use them. If available they can normally be set at the command line. Often they can also be used from inside the XQuery document.

Below I only list a handful of XQuery processors to make my point. AltovaXML and Saxon are the only processors so far being both for XQuery and XSLT.

1.1 AltovaXML

In AltovaXML 2008 Online Manual we don't find a word about serialization parameters, but in Developer Reference Manual (PDF) we read that only the following four parameters can be set at the command line: outputMethod, omitXMLDeclaration, outputIndent, outputEncoding.

In AltovaXML there is no way to make the DOCTYPE for the XHTML document, and there is no way to use the four serialization parameters available from inside the XQuery document.

1.2 Saxon

In additional serialization parameters in Saxon processors, we read that all serialization parameters in the spec are available at the command line, and also as extensions from inside the XQuery document in the Saxon namespace. There are even additional extension parameters to those in the spec.

1.3 XQSharp

Serialization parameters in XQSharp, an XQuery processor for the .NET framework, are implemented as in Saxon except for the namespace and prefix.

1.4 DataDirect

The documentation for XQuery at DataDirect is a mess. I have not been able to find any information about serialization parameters.

2. Serialization to XHTML

When creating XHTML with XQuery, many of the 16 serialization parameters could be relevant. Below we only mention the most common when outputting to XHTML.

2.1 method

I have used the "xml" option in XSLT when transforming to XHTML since beginning of time (the "xhtml" option didn't exist in XSLT 1.0), and it works well in my case, also almost 10 years back, because my web pages are used by people having the newest browsers. Only one big issue exists even today. [2]

The "xhtml" option inserts <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in XHTML's head section. Even today Internet Explorer 7 ignores the encoding specified in the XML declaration in an XHTML document and uses it's own default, if <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> is not present.

The "xml" option for XHTML works well for English but not right away for languages having characters that don't exist in ASCII. But all you need to do if you prefer to use the "xml" option also for XHTML output, as I do [3], is to insert <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in XHTML's head section as it is constructed inside the XQuery or XSLT document, as can be seen in the example below. [4]

method
method="xml" or method="xhtml"

method/Saxon
declare option saxon:output "method=xml";
or
declare option saxon:output "method=xhtml";

2.2 doctype

This is not the place for a discussion of what DOCTYPE to use for XHTML. In the example we use XHTML 1.0 Strict! To create a correct DOCTYPE for XHTML we need two serialization parameters:

DOCTYPE

doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"

In Saxon XQuery processor they are applied in the XQuery document like this:

DOCTYPE/Saxon

declare option saxon:output "doctype-public=-//W3C//DTD XHTML 1.0 Strict//EN";
declare option saxon:output "doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";

2.3 omit-xml-declaration

Since XHTML is XML just by being well-formed, you can just as well use the XML declaration. The spec says that the XML declaration is not necessary for XML but that it is highly recommended to use it.

omit-xml-declaration

omit-xml-declaration="no"

omit-xml-declaration/Saxon

declare option saxon:output "omit-xml-declaration=no";

2.4 indent

When outputting to XHTML it is nice that the source document is indented because some users like to take a look at the source code. But indention could also be set to "no" if it is necessary to optimize the web page for speed (extreme optimization).

It is important to note that indent="yes" works differently for method "xml" and method "xhtml". If "xml" is used indention takes place in all elements. If "xhtml" is used, indention doesn't take place inside block level elements like "p".

The "xml" type of indention inserts whitespace that for some elements like "p" also turns up in the XHTML webpage when rendered in the browser. This is a real problem for me when colorizing code examples resulting in many "span" elements inside "p" elements.

If you need to avoid indention inside some XHTML elements like "p", you must use output method "xhtml" instead of "xml", or use indention="no", or, if you use Saxon, you can use an extension attribute called saxon:suppress-indentation to specify indention exceptions (I use the last method for this XHTML document generated by XSLT).

indent

indent="yes"
saxon:suppress-indentation="p li td"

indent/Saxon

declare option saxon:output "indent=yes";
declare option saxon:suppress-indentation "p li td";

3. Namespaces in XQuery

There is a major difference in how XQuery and XSLT treat namespaces. In XSLT we have different default namespaces for input and output. In XQuery they are supposed to be the same.

Let us say we have a simple test XML input document like the following:

Input XML

<?xml version ="1.0">
<para>yes</para>

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <p xmlns="http://www.w3.org/1999/xhtml">
      <xsl:value-of select="para"/>
    </p>
  </xsl:template>
</xsl:stylesheet>

In the XSLT example only the "p" element is in the "http://www.w3.org/1999/xhtml" default namespace. The "para" element is in no namespace and is only found in input XML if it is in no namespace.

XQuery

<p xmlns="http://www.w3.org/1999/xhtml">
  {data(para)}
</p>

In the XQuery example, both "p" and the "para" element are in the "http://www.w3.org/1999/xhtml" default namespace. The "para" element in only found in input XML if it is also in the "http://www.w3.org/1999/xhtml" default namespace.

If we want to output XHTML with a proper default namespace declaration, xmlns="http://www.w3.org/1999/xhtml", four basic situations arise:

  1. If the input XML is also in the XHTML namespace, we have no problem.
  2. If the input XML is in another namespace with a prefix we just declare that in the XQuery and use the prefix when referring input XML.

  3. If the input XML is in its own default namespace (no prefix), we declare it in XQuery and make up some "dummy" prefix we can use when we refer to input XML.

  4. If the input XML is in no namespace we are out for a treat. We simply use the asterisk (*) wildcard as prefix when referring input XML. The asterisk wildcard in this case means: "any prefix or no prefix".

Of the four situations above, the last is the tricky one. The XQuery example from before now looks like this:

XQuery

<p xmlns="http://www.w3.org/1999/xhtml">
  {data(*:para)}
</p>

The wildcard trick is considered quick and dirty and should only be used for testing. In the following I will first show a full XQuery example using the wildcard trick. Next I will show a more correct XQuery example using the cumbersome long form of element constructor to put the namespace in each time.

There are plenty of additional methods to choose from if we don't want to use the wildcard trick or the long form of the element constructor. They all make use of functions. I have proposed one such solution to the XQuery mailing list that is now recognized as probably the best. [5] It's presented below as "XQuery 3".

4. XHTML from XML in no namespace

We will transform an input document in no namespace into an XHTML table in an XHTML document using XQuery. Since we are looking at XQuery coming from XSLT, we will choose an XQuery processor that is also an XSLT processor. Saxon is the natural choice. As we have seen earlier, AltovaXML lacks the serialization parameters to create a DOCTYPE.

4.1 Input XML

products.xml

<?xml version="1.0"?>
<products>
  <product id="p1">
    <name>Delta</name>
    <price>800</price>
    <stock>4</stock>
    <country>Denmark</country>
  </product>
  <product id="p2">
    <name>Golf</name>
    <price>1000</price>
    <stock>5</stock>
    <country>Germany</country>
  </product>
  <product id="p3">
    <name>Alfa</name>
    <price>1200</price>
    <stock>19</stock>
    <country>Germany</country>
  </product>
  <product id="p4">
    <name>Foxtrot</name>
    <price>1500</price>
    <stock>5</stock>
    <country>Australia</country>
  </product>
  <product id="p5">
    <name>Tango</name>
    <price>1225</price>
    <stock>3</stock>
    <country>Japan</country>
  </product>
</products>

4.2 XQuery 1: Quick and dirty using namespace wildcard

products2xhtml-1.xquery

declare namespace saxon="http://saxon.sf.net/"; [6]
declare option saxon:output "method=xml";
declare option saxon:output "doctype-public=-//W3C//DTD XHTML 1.0 Strict//EN";
declare option saxon:output "doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
declare option saxon:output "omit-xml-declaration=no";
declare option saxon:output "indent=yes";

<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> [7]
    <style type="text/css">
      body{{font-family: Verdana, Arial, Helvetica, sans-serif}}[8]
      table, th, td{{border: 1px solid gray; border-collapse:collapse}}
      .alt1{{background-color:mistyrose}}
      .alt2{{background-color:azure}}
      .th{{background-color:silver}}
    </style>
    <title>Using XQuery</title>
  </head>
  <body>
    <h1>Using XQuery</h1>
    <table cellspacing="0" cellpadding="5">
      <tr class="th">
        <th>no</th>
        <th>id</th>
        <th>name</th>
        <th>price</th>
        <th>stock</th>
        <th>country</th>
      </tr>
      {for $a at $b in *:products/*:product return[9]
      <tr class="{if ($b mod 2 = 0) then "alt1" else "alt2"}"> [10]
        <td>{$b}</td>
        <td> {$a/@id/data(.)}</td> [11]
        <td>{$a/*:name/data(.)}</td> [12]
        <td>{$a/*:price/data(.)}</td>
        <td>{$a/*:stock/data(.)}</td>
        <td>{$a/*:country/data(.)}</td>
      </tr>}
    </table>
  </body>
</html>

4.3 XQuery 2: Long form of element constructor

products2xhtml-2.xquery

declare namespace saxon="http://saxon.sf.net/";
declare option saxon:output "method=xml";
declare option saxon:output "doctype-public=-//W3C//DTD XHTML 1.0 Strict//EN";
declare option saxon:output "doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
declare option saxon:output "omit-xml-declaration=no";
declare option saxon:output "indent=yes";

element {QName('http://www.w3.org/1999/xhtml', 'html')}{
  attribute xml:lang{"en"},
  element {QName('http://www.w3.org/1999/xhtml', 'head')}{
    element {QName('http://www.w3.org/1999/xhtml', 'meta')}{
    attribute http-equiv{"Content-Type"},
    attribute content{"text/html; charset=UTF-8"}},
    element {QName('http://www.w3.org/1999/xhtml', 'style')}{
    attribute type{"text/css"},
    "body{font-family: Verdana, Arial, Helvetica, sans-serif}
    table, th, td{ border: 1px solid gray; border-collapse:collapse}
    .alt1{background-color:mistyrose}
    .alt2{background-color:azure}
    .th{background-color:silver}"},
    element {QName('http://www.w3.org/1999/xhtml', 'title')}{"Using XQuery"}
  },

  element {QName('http://www.w3.org/1999/xhtml', 'body')}{
    element {QName('http://www.w3.org/1999/xhtml', 'h1')}{"Using XQuery"},
    element {QName('http://www.w3.org/1999/xhtml', 'table')}{
      attribute cellspacing{"0"},
      attibute cellpadding{"5"},
      element {QName('http://www.w3.org/1999/xhtml', 'tr')}{
        attribute class{"th"},
        element {QName('http://www.w3.org/1999/xhtml', 'th')}{"no"},
        element {QName('http://www.w3.org/1999/xhtml', 'th')}{"id"},
        element {QName('http://www.w3.org/1999/xhtml', 'th')}{"name"},
        element {QName('http://www.w3.org/1999/xhtml', 'th')}{"price"},
        element {QName('http://www.w3.org/1999/xhtml', 'th')}{"stock"},
        element {QName('http://www.w3.org/1999/xhtml', 'th')}{"country"}
    },

    for $a at $b in products/product return

      element {QName('http://www.w3.org/1999/xhtml', 'tr')}{
        attribute class{if ($b mod 2 = 0) then "alt1" else "alt2"},
        element {QName('http://www.w3.org/1999/xhtml', 'td')}{$b},
        element {QName('http://www.w3.org/1999/xhtml', 'td')}{$a/@id/data(.)},
        element {QName('http://www.w3.org/1999/xhtml', 'td')}{$a/name/data(.)},
        element {QName('http://www.w3.org/1999/xhtml', 'td')}{$a/price/data(.)},
        element {QName('http://www.w3.org/1999/xhtml', 'td')}{$a/stock/data(.)},
        element {QName('http://www.w3.org/1999/xhtml', 'td')}{$a/country/data(.)}
      }
    }
  }
}

4.4 XQuery 3: Using function to change namespace

I have borrowed the function from Priscilla Walmsley's book, XQuery, ISBN-13: 978-0596-00634-1, 2007, Chapter 20. It differs slightly from the similar function at http://www.functx.com, made to change one namespace to another. If you prefer to use the function at the website, you will need to use "" as third argument when calling the function.

products2xhtml-3.xquery

declare namespace saxon="http://saxon.sf.net/";

declare namespace functx = "http://www.functx.com";
declare function functx:change-element-ns-deep ($element as element(), $newns as xs:string) as element()
{
  let $newName := QName($newns, local-name($element))
  return
  (element {$newName}
  {
    $element/@*, for $child in $element/node()
      return
        if ($child instance of element())
        then functx:change-element-ns-deep($child, $newns)
        else $child
  }
  )
};

declare option saxon:output "method=xml";
declare option saxon:output "doctype-public=-//W3C//DTD XHTML 1.0 Strict//EN";
declare option saxon:output "doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";
declare option saxon:output "omit-xml-declaration=no";
declare option saxon:output "indent=yes";

declare variable $x as element() :=
<html xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <style type="text/css">
    body{{font-family: Verdana, Arial, Helvetica, sans-serif}}
    table, th, td{{ border: 1px solid gray; border-collapse:collapse}}
    .alt1{{background-color:mistyrose}}
    .alt2{{background-color:azure}}
    .th{{background-color:silver}}
    </style>
    <title>Using XQuery</title>
  </head>
  <body>
    <h1>Using XQuery</h1>
      <table cellspacing="0" cellpadding="5">
        <tr class="th">
          <th>no</th>
          <th>id</th>
          <th>name</th>
          <th>price</th>
          <th>stock</th>
          <th>country</th>
        </tr>

  {for $a at $b in products/product return
        <tr class="{if ($b mod 2 = 0) then "alt1" else "alt2"}">
          <td>{$b}</td>
          <td> {$a/@id/data(.)}</td>
          <td>{$a/name/data(.)}</td>
          <td>{$a/price/data(.)}</td>
          <td>{$a/stock/data(.)}</td>
          <td>{$a/country/data(.)}</td>
        </tr>}
    </table>
  </body>
</html>;

functx:change-element-ns-deep($x, "http://www.w3.org/1999/xhtml")

This last third method is nice because you create the XHTML output as you are used to the literal way. Exactly as when you use the quick and dirty method using wildcard for the namespace. The only difference is that you don't declare the XHTML namespace in the html element, and that you select input using not qualified element names.

It takes hard work getting the function in place the first time but once it is ready, it is very easy to reuse.

4.5 Output XHTML

products2xhtml.html

Footnotes

[1]

1. edition of tutorial published 2008-11-07. Changes due to discussions at the "talk -- XQuery discussion" forum, see footnote 5. Comparison of XQuery and XSLT has beed moved into a tutorial of its own.

[2]

See XHTML Output Method in the spec for all details. In XHTML Media Types, 2008, the editors take a more conservative approach. I find it counter productive to follow the beat of not named more or less hypothetical browsers with a less than 1% market share. As long as XHTML validates and is supported in the last 3 versions of IE, and in the latest versions of Firefox, Opera and Google Chrome, I can live with it.

[3]

I use all sorts of XML all the time and I simply find it idiotic to apply special rules to only one XML application, XHTML, when it is not necessary.

[4]

At my www.xmlplease.com website I use XHTML 1.1 served with mimetype "application/xhtml+xml" to browsers supporting it and XHTML 1.0 Strict! with mimetype "text/html" to browsers not yet supporting "application/xhtml+xml" like Internet Explorer 7. The only difference between the two documents is in my case the DOCTYPE and the use of <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> in XHTML 1.0.

[5]

See thread Generating xhtml from xml in no namespace at the "talk -- XQuery discussion" forum.

[6]

Most XQuery documents would work in any XQuery processor but it is impossible to make a portable XQuery document outputting XHTML. The reason is that there is no standard way of setting the serialization parameters. In the example we use Saxon. Of the XQuery processors mentioned above we could get the XQuery working in XQSharp instead just by changing namespace and prefix.

[7]

In the "head" section, we must use <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> because we have chosen to use method="xml". Had we used method="xhtml", the metatag would have been inserted automatically.

[8]

In the "style" element we must escape the curly braces "{". We must use double curly braces to prevent them from being interpreted as XQuery syntax.

[9]

"at $b" makes use of the XQuery keyword "at", XQuery's way of making a counter. The position() function is also available in XQuery but it does not work in a "for" expression like the position() in the xsl:for-each element in XSLT.

[10]

{if ($b mod 2 = 0) then "alt1" else "alt2"}: Using modulus is the standard way in webdesign to make alternating colors in a table.

[11]

We don't need to prefix an attribute with "*". An attribute in a default namespace is in no namespace. For that reason the XHTML output default namespace doesn't apply to the "@id" attribute. If we use the "*" prefix the attribute is still found, the "*" meaning any or no namespace.

[12]

The wildcards (*) in the expressions are needed to refer to elements in the input XML document in no namespace.

Updated 2009-11-13