All Articles

Jesper Tverskov: XSLT, XHTML5, XML Schema, Schematron, XML, XPath, XQuery, REGEX for XML.


Benefits of polyglot XHTML5

HTML5 has so many syntax options that at least some web designers and developers prefer to use a consistent subset. In HTML5 almost anything in both old XHTML and HTML are allowed in the same valid document. With no self-imposed restrictions, HTML5 markup has a tendency to attract dirt attracting more dirt. In this tutorial we don't look at all the nice new features in HTML5. We focus on what basic subset to use.


Disable-output-escaping and xsl:character-map

DOE is short for "disable-output-escaping". DOE is the name of an attribute we can use in xsl:value-of and in xsl:text. XSLT processors are not required to support it, and in XSLT 2.0 DOE is deprecated. We should use xsl:character-map instead. "Character-map" is a general method for replacing a character with a string when output is serialized.

Understanding xml:space

The xml:space="preserve" attribute is common in some XML documents. But what the attribute means is obscured by the fact that it is often used for no good reason. It could be there in an element in the source code because some developer inserted it as an experiment and forgot to delete it again.


Transform XML to CSV with XSLT pipeline

This tutorial is a showcase for the templating power of XSLT. The XML to CSV solution is a pipeline of 7 xslt stylesheets providing user-defined pre-processing and automatic flattening and equalizing. Comma, quote, linebreak, leading and trailing whitespace are supported in data.


Styling XML with CSS

All text books about XML have long chapters about how to style XML with CSS. This is misleading because it is used very little in the real world. It is fair to say that CSS for XML is only relevant as en exception to the rule. Students of XML should know that we can style XML with CSS but actually doing it is mostly a waste of time.

Name, NmToken, QName, NCName

Since there are restrictions on what can be used as a name for elements and attributes, the XML standards have come up with a host of "name" terms that are bound to confuse beginners in the subject. Name, NmToken, QName and NCName are also datatypes and XSLT/XPath and XQuery even have functions about some of them. Regular Expressions have \i and \c for initial character in Name and allowed characters in NmToken.

Google's Writely and XSLT for web pages

This article is written with Google's Writely and can at any time be edited by me at Google Docs. This XHTML webpage, on the other hand, is at my own website. It's a transformation of the page at Google Docs using XSLT. A script, called from my document at Google Docs, takes care of publication at my website.

xsl:namespace in XSLT 2.0

The xsl:namespace element is most often useful if namespaces are created dynamically or if they are part of content of elements and attributes as QNames, or if we for cosmetic reasons want to transfer namespace declarations from inner elements to outer element in output.

XHTML sections: implicit2explicit-hierarchy.xsl

How to unflatten XML often comes up in XSLT Help Fora like the xsl-list. In this tutorial I give a complete and detailed example of how to transform an XHTML document using h1-h6 headings into an explicit hierarchical structure of nested sections.


Creating XHTML with XQuery

Creating XHTML documents with XQuery is easy when we know how. There are two challenges: how to set the serialization parameters for the proper XHTML DOCTYPE, and how to refer input XML when output is in the XHTML default namespace.

XQuery and XSLT compared

XSLT became a standard in 1999 and XQuery in 2007 but they have so much in common that one of the best things about XSLT and XQuery is that knowing one of them and you can also be fluent in the other in a matter of days.

Testing XQuery Update Facility 1.0

With XQuery Update Facility we can update XML files using insert, replace, rename and delete with the same ease as when we update tables with SQL. In XSLT on the other hand, we can only update to a new file and must use the modified identity template.

Saxon XQuery processor in ASP.NET

There are several XQuery processors for .NET to choose from. In this example I use Saxon having the benefit of also being an XSLT processor. When you have the XQuery part of the processor working or the XSLT 2.0 part, the other one is also ready for use.

Elements and functions available in XSLT processors

In XSLT we can use the functions element-available() and function-available() to see what instruction elements and functions are available including extensions. I have made an XSLT stylesheet testing the availability of all XSLT instruction elements, XSLT and XPath functions and Saxon and EXSLT extensions.

Attributes and XML namespaces

It is confusing for beginners in XML that the attributes of an element are not in a namespace if they don't have a proper namespace prefix. There is no default namespace for attributes. It is almost a rule that attributes are not in a namespace.

XSLT 2.0 Saxon in ASP.NET

XSLT 2.0 became standard in 2007 and we want to use it in ASP.NET. Microsoft has not made an XSLT 2.0 processor yet, but who really cares. It is easy to use the .net version of the Saxon XSLT 2.0 processor instead. This is a tutorial to get you started.

Quotations about XML

When I find something interesting about XML, easy to quote, I add it to this small collection of quotations about XML. Mail me If you know of a good piece that is missing

Understanding Processing Instructions in XML

Processing Instructions are special tags with instructions to software making us of the XML document. It sounds important and makes PIs extremely confusing for novices. The truth is that PIs are seldom used with a few common exceptions

Identity transformation for XSLT 2.0

The traditional identity template has several shortcomings. The most important are that XML declaration and DTD are not recreated and that default attributes found in DTD are copied to the output. In XSLT 2.0, using saxon-parse() and saxon-serialize(), it is possible to supplement the identity template with extra templates and instructions overcoming all limits and inconveniences.


XML Schema Element Syntax Summary

The summaries are taken from the XML Schema Recommendations, made more user-friendly, and link back to the specs. All elements can have an "id" attribute and all elements can have xs:annotation as child except xs:annotation itself.

13 XPath axes

An XPath axis is a path through the node tree making use of particular relationship between nodes. We use the "child::*" axis and the "attribute::*" axis all the time but mostly their short form: "*" and "@*". The other axes are used far less often. If we need them or not very much depends on what we want to achieve. For more information see XPath axes in the W3C Recommendation.

User-defined function for line-number in XSLT

In XSLT 2.0 we have 130 functions but we don't have a function to return the line-number of an element node. It is a challenging exercise to make a user-defined function for line-number. We need to get a lot of the new stuff in XSLT 2.0 working, like sequences, unparsed text and Regular Expressions.

Deleting comments, CDATA sections and PIs from unparsed text with REGEX in XSLT

When loading XML with the unparsed-text() function to test it with Regular Expressions, we must avoid false positives. What looks like markup in comments, PIs and CDATA sections gets escaped exactly like real markup. To make testing of markup easier it can be necessary to delete comments, CDATA sections and PIs.

Schematron for XML and XHTML prolog validation

With DTD, XML Schema or other grammar-based schema languages we can not validate the prolog that is what is before the top-element of an XML document. We can use Schematron to validate XML declaration and DTD and to validate comments, processing instructions and whitespace before the top-element.

Validating implicit XHTML hierarchy with Schematron

Most XHTML documents benefit from an implicit hierarchical structure made with h1-h6 heading elements. The hierarchy is making further processing easier, and it adds to usability, accessibility and to Search Engine Optimization. We can use Schematron to validate an implicit hierarchy.

Schematron for checksum validation

ISO Schematron, using XPath 2.0 for testing, is the natural choice for schema validation of numbers with checksum digits in XML documents. In this tutorial we look at MOD 10 calculations for UPC, EAN, ISBN and credit card numbers. First we make the XPath expressions in XSLT and at the end we transfer them to Schematron.

Collection() with REGEX in XSLT

Collection() is a non-standardized standard function. It can be used as a better version of document() and doc() with wild cards and Regular Expressions to load a collection of XML documents. Or it can use a catalog file.


Making a TOC of nested list elements for an XHTML document by hand or by code is usually among the more tiresome or difficult tasks. With XSLT 2.0 it is relatively easy to transform the XHTML document to itself (identity transform) and let extra templates add the TOC, the links and the numbers.

Valid XHTML with schema-aware XSLT 2.0

With a schema-aware XSLT 2.0 processor we can test if XML output is valid as we create it. Nice that no XML with a schema, e.g. xhtml, can be generated in our system if it is not valid. Sorry, but schema-awareness is not mature yet.

The shape="rect" attribute in xhtml to xhtml

When we transform XHTML to XHTML using the identity template or "@*" in the match attribute we suddenly end up with shape="rect" attributes in all anchor elements of the XHTML output document.


Tricky whitespace handling in XSLT

The xsl:strip-space and xsl:preserve-space elements are only relevant for whitespace-only text nodes. Some XSLT processors have not even implemented these elements but strip such nodes themselves.

XSLT 1.0 Element Syntax Summary (group)

This element syntax summary is an enhanced and a more user-friendly version of the syntax summary in the XSLT 1.0 Recommendation. The headings link to the spec for additionel information.

XSLT 1.0 Element Syntax Summary (a-z)

This element syntax summary is an enhanced and a more user-friendly version of the syntax summary in the XSLT 1.0 Recommendation. The headings link to the spec for additionel information.

Using unparsed-text() in XSLT 2.0 to test prolog

In XSLT 2.0 we can use the unparsed-text() function to test the XML declaration and the DOCTYPE declaration. We can read the pseudo-attributes of the XML declaration and the values of PUBLIC or SYSTEM in the DTD in order to recreate or modify them as we please.

The standalone pseudo-attribute is only relevant if a DTD is used

The standalone pseudo-attribute in the XML declaration is a mystery for many XML beginners. Most often it is irrelevant but it is tempting to add it or delete it or to change its value from "yes" to "no" to "yes" as long as you don't know what it is all about.

Transform XHTML documents into one big document

It is easy to transform many XML documents into one XML document. But a few tricks and a little experience is needed especially when using XHTML as input and output. In the following we will look at different ways of loading all the XML files into the transformation process in XSLT 1.0 and XSLT 2.0.


Transform XHTML to XHTML with XSLT

Considering how useful it could be to transform the XHTML based web to another format or to use XHTML as an XML data store, it is surprisingly tricky to transform XHTML. Most XSLT developers need to be told the secrets of XHTML transformation in order to do it.

UBL Naming and Design Rules

Universal Business Language (UBL) has an impressive UBL Naming and Design Rules Checklist [1]. UBL has been a great source of inspiration also for XML Schema projects not related to UBL.

Identity Template: xsl:copy with recursion

The so-called identity template that copies everything from input.xml to output.xml, element for element, attribute for attribute, is the most important of all templates in combination with templates of exceptions.

Updated: 2010-09-15