Jesper Tverskov, August 27, 2009

Name, NmToken, QName, NCName

Since there are restrictions on what can be used as a name for elements and attributes, the XML standards have come up with a host of "name" terms that are bound to confuse beginners in the subject. Name, NmToken, QName and NCName are also datatypes and XSLT/XPath and XQuery even have functions about some of them. Regular Expressions have \i and \c for initial character in Name and allowed characters in NmToken.

The XML Recommendation uses the terms Name and NmToken. An additional standard, Namespaces in XML, has come up with two more terms: QName and NCName putting a few restrictions on the original XML standard.

1. XML Recommendation

In the XML standard an element, an attribute and a Processing Instruction name must be a Name. A Name consists of one or more NmToken (letters | digits |_ | : | .) except for the first character that can only be a letter, ":" or "_".

More precisely this is how it is formulated in the spec:

NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
Name ::= NameStartChar (NameChar)*
Names ::= Name (#x20 Name)*
NmToken ::= (NameChar)+
NmTokens ::= Nmtoken (#x20 Nmtoken)*

2. Namespaces in XML

In the Namespaces in XML standard an element and an attribute name in a namespace must be a QName (Qualified Name). A QName must either contain a Prefix + ":" + LocalPart, or just a LocalPart if in a default namespace. If in no namespace an XML name must be like the LocalPart. Both the Prefix and the LocalPart must be a NCName, that is a non-colonized Name, that is a Name without colons.

More precisly this is how it is formulated in the spec:

QName ::=

PrefixedName | UnprefixedName

PrefixedName ::= Prefix ':' LocalPart
UnprefixedName ::= LocalPart
Prefix ::= NCName
LocalPart ::= NCName

The consequences of the above is that an element or an attribute name must be a QName if in a Namespace or else a NCName if in a default namespace or if in no namespace. Another consequence is that colon is never allowed in an element or attribute name except as separator between "prefix" and "localpart".

In other words: both the prefix and the localpart can only begin with a letter or an underscore and both prefix and localpart can contain letters, digits, "." and "_".

3. Datatypes in XML Schema and DTD

In XML Schema 1.0 we have 44 build in datatypes. Among them we have xs:NAME, xs:NMTOKEN, xs:NMTOKENS (two or more separated with spaces), xs:QName and xs:NCName. The three first are spelled with upper-case, a tradition borrowed from DTD. These three datatypes exist both in XML Schema an in the XML standard where the DTD schema language is defined.

4. Regular Expressions

Regular Expressions used by XML Schema, XSLT, XPath and XQuery are defined in XML Schema. It is only natural that we in these Regular Expressions made for XML in addition to the traditional "Multi-Character Escapes" like \s, \w and \d also have \i matching the set of inititial XML name characters (Letter | _ | :) and \I matching the rest, that is [^\i]; and \c matching the set of NcName characters (letter | digit| _ | . | :), and likewise \C matching the rest.

5. Functions in XSLT, XPath and XQuery

XSLT, XPath and XQuery use the datatypes of XML Schema. We can both create data of types like xs:NAME, xs:NCNAME, xs:NCNAMES, xs:QName and xs:NCName and we can test if some data is of that type. In addition we also have a few functions about the two last types.

The function name() returns the name of an element or an attribute that is both "Prefix" + ":" + "LocalPart", if both prefix and localpart exist. The function local-name() returns the localpart. We also have functions like QName(), prefix-from-QName(), local-name-from-QName().

6. Processing Instruction

In XML, comments and CDATA sections don't have names but Processing Instructions have names called PITarget (!). A PI looks like this: <?someName someString?>. The Name of a PI must follow the rules of initial character for a Name and the rest of the characters must be NmToken. Also the start of a user-defined PITarget must not be "xml".

Updated: 2009-08-27