Jesper Tverskov, February 19, 2007

The shape="rect" attribute in xhtml to xhtml

When we transform XHTML to XHTML using the identity template or "@*" in the match attribute we suddenly end up with shape="rect" attributes in all anchor elements of the XHTML output document.

This shape="rect" baffled me for a long time, and has irritated me ever since I first encountered it. For a period I thought that it was some bug in my favourite XSLT processor.

Then I received a hint from the XSL-mailing list that I should take a closer look at the DTD of XHTML. Yes! Now I remember! We have something called default values for attributes. In schema languages like DTD and in XML Schema we can declare default values.

1. The DTD of XHTML

The shape attribute is normally only used in the area element of image maps but it is also allowed in the anchor element but hardly ever used. In the DTD of XHTML 1.0 and XHTML 1.1 we can find the following declaration:

<!ATTLIST a shape %shape; "rect">

Never mind the details of how to read a DTD. The above says that the anchor element can have a shape attribute and that the default value is "rect". The attribute list for the anchor element is much longer in the DTD.

2. Problematic default values

It is interesting that DTD and XML Schema allow for default values to be declared. The RELAX NG schema language has taken the opposite approach. Here default values are not allowed. A schema should only be for validating the structure of a document.

The irritating default attribute values in the DTDs of XHTML is a strong argument in favor of the approach RELAX NG has taken. Don't confuse the world for nothing or almost nothing. Each and every time we transform XHTML to XHTML we run the risk of several unwanted attributes showing up in our result document.

The shape attribute is the most common but we also have the colspan="1" and rowspan="1" attributes in tables. In XHTML 1.1 it is even worse with version="-//W3C//DTD XHTML 1.1//EN" in the html element and profile="" in the head element.

3. Don't copy that junk

We can avoid the unwanted attributes from appearing by adding templates to our XSLT stylesheet matching them explicitly in order to prevent them from being copied out of the DTD. Such a template for the shape attribute looks like this:

<xsl:template match="@shape"/>

The shape attribute is also matched by the template having an "@*" in the match attribute but is overruled by this new template having a more explicit match attribute. The new template has no content and that is what it should do: nothing.

If we use two XHTML namespace declarations (a default and a prefixed) in our XSLT stylesheet in order to get to the markup in our XHTML document, and if we make our templates even more specific, they could look like this: [1]

<xsl:template match="xhtml:a/@shape"/>
<xsl:template match="xhtml:html/@version"/> <!-- 1.1 -->
<xsl:template match="xhtml:head/@profile"/> <!-- 1.1 -->
<xsl:template match="@rowspan"/>
<xsl:template match="@colspan"/>

We should modify the above templates if they also delete markup from our XHTML input document that we want to be copied over. We only want to delete the default values specified in the DTD.

4. I am not the only one

Is default values in the DTD popping up in our result document a big problem? Yes! Go to the homepage of w3.org and look at the source code. We have shape="rect" all over the place in each and every anchor element for nothing.

I have made a cached version of the w3.org homepage (cached 2007-02-19). Compare it with the source code of the actual one to see if w3.org has made a more specific template to eliminate the dirt.

Why make markup longer than necessary? Why fool web designers into thinking: "Could this be an important attribute I should start using?" The cached homepage of w3.org is not best practice. It is not a quality webpage.

Footnotes

[1]

See my article, Transforming XHTML to XHTML with XSLT.

Updated 2009-08-06