ISO Schematron, using XPath 2.0 for testing, is the natural choice for schema validation of numbers with checksum digits in XML documents. In this tutorial we look at MOD 10 calculations for UPC, EAN, ISBN and credit card numbers. First we make the XPath expressions in XSLT and at the end we transfer them to Schematron.

- 1. MOD 10 with XPath 2.0
- 2. Instance document (checksum.xml)
- 3. UPC
- 4. EAN-13
- 5. ISBN-13
- 6. ISBN-10
- 7. Credit cards
- 8. Additional tests
- 9. Schematron (checksum.sch)
- 10. Validate Schematron
- 11. Validation in XML Editor

In XPath 2.0 checksum testing is much easier than in XPath 1.0. We have many new functions and even some of the old ones like `sum()`

now also take sequences as input. We have conditional testing (if-then-else) inside XPath expressions, "for $x in $y…" constructs, etc. For this tutorial you must know at least some Schematron, XPath and XSLT already. [1]

In the following we only recapitulate the most necessary about numbers, checksums, MOD 10 calculations, UPC, EAN, ISBN and credit card numbers. Wikipedia is a good place to start, if you need more information. [2]

All examples in this tutorial use the following XML instance document. "Instance" normally means a document based on a schema. For our instance document we don't care about what XML schema is behind.

We are simply going to use Schematron to set up additional validation of a nature grammar-based schema languages like DTD, XML Schema and RELAX NG were never meant to do.

`<?xml version="1.0"?>`

<test>

<upc>639382000393</upc>

<ean-13>5701291191822</ean-13>

<isbn-13>ISBN 978-0-471-79119-5</isbn-13>

<isbn-10>ISBN 0764569104</isbn-10>

<visa>4561419020291774</visa>

</test>

In our XPath in XSLT examples we only look at the numbers reduced to their digits to keep things easy. In the real world grouping of digits with spaces or hyphens in between is sometimes allowed. In our Schematron schema the above patterns for ISBN are required.

We are now ready to make XPath expressions in XSLT that can validate the numbers based on the checksum, the last digit. My XPath expressions borrow heavily from suggestions put forward by members of the mulberrytech xsl-list.

At the end of this tutorial, when we have all the necessary XPath expressions ready, we transfer them from XSLT to our Schematron schema with minor modifications.

We start with the Universal Product Code (UPC) being the easiest to test. The most common form of UPC has 12 digits. The last digit is the checksum as in all examples in this tutorial.

- Remove the last digit from the number.
- The digits in odd positions are multiplied with 3 and added together.
- The digits in even positions are added together.
- Calculation = (odd sum + even sum) MOD 10.
- If calculation is 0 then 0 is the checksum.
- Else checksum = 10 - calculation.

- UPC number = 639382000393
- Checksum = 3
- Number without checksum = 63938200039
- Odd positions = 6*3 + 9*3 + 8*3 + 0*3 + 0*3 + 9*3 = 18 + 27 + 24 + 0 + 0 + 27 = 96
- Even positions = 3 + 3 + 2 + 0 + 3 = 11
- Calculation = (96 + 11) MOD 10 = 107 MOD 10 = 7
- Recalculated checksum =

if (calculation != 0) then (10 - calculation) else 0

In our example: 10 - 7 = 3 - UPC is valid if recalculated checksum (3) = checksum (3)

First we make a "checksum" variable containing the checksum only. The two functions `substring(., string-length(), 1)`

extract the checksum, and `number()`

makes the returned string value into a number.

`<xsl:variable name="checksum" select="number(substring(., string-length(), 1))"/>`

In the variable named "digits", the two functions `substring(., 1, string-length() - 1)`

extract the number without the check digit. The function `string-to-codepoints()`

convert the string to a sequence of UNICODE numbers. The "- 48" convert the UNICODE numbers back to the original digits.

`<xsl:variable name="digits" select="for $i in string-to-codepoints(substring(., 1, string-length() - 1)) return $i - 48"/>`

The "calculation" variable first multiply all the digits in odd positions `[position() mod 2 = 1]`

with 3 and return the results as a sequence. All the digits in even positions are also returned. The new sequence of items are added together by the `sum()`

function. At the end we do MOD 10.

`<xsl:variable name="calculation" select="sum((for $i in $digits[(position()) mod 2 = 1] return $i*3, for $i in $digits[(position()) mod 2 = 0] return $i)) mod 10"/>`

We first test if $calculation is 0. If it is, 0 is the recalculated checksum. If it is not 0, we subtract $calculation from 10 to get the recalculated checksum.

`<xsl:if test="(if ($calculation ne 0) then (10 - $calculation) else 0) ne $checksum">Not correct UPC. Could be a typo?</xsl:if>`

The number is read from right to left. Since the last digit is the checksum, the second digit from the right is the first digit when we make the calculation.

- Remove the last digit from the number.
- Reverse the rest of the number.
- Multiply the digits in odd positions with 3 and add the results together.
- Add the digits in even position together.
- Calculation = (odd sum + even sum) MOD 10.
- If calculation is 0 then 0 is the checksum.
- Else checksum = 10 - calculation.

- EAN-13 number = 5701291191822
- Checksum = 2
- Number without checksum reversed = 281911921075
- Odd positions = 2*3 + 1*3 + 1*3 + 9*3 + 1*3 + 7*3 = 6 + 3 + 3 + 27 + 3 + 21 = 63
- Even positions = 8 + 9 + 1 + 2 + 0 + 5 = 25
- Calculation = (63 + 25) MOD 10 = 88 MOD 10 = 8
- Recalculated checksum =

if (calculation != 0) then (10 - calculation) else 0

In our example: 10 - 8 = 2 - UPC is valid if recalculated checksum (2) = checksum (2)

The variable named "checksum" contains the check digit. We need two functions, `substring(., string-length(), 1)`

, to extract the check digit, and we need `number()`

to convert it to a number.

`<xsl:variable name="checksum" select="number(substring(., string-length(), 1))"/>`

We make a variable named "digits" for the number without the checksum. We need two functions, `substring(., 1, string-length() - 1)`

, to extract the number. We use the `string-to-codepoints()`

function to turn the string of digits into a sequence of UNICODE numbers. The `reverse()`

function only works on a sequence. We then use "- 48" to get our original digits back.

`<xsl:variable name="digits" select="for $i in reverse(string-to-codepoints(substring(., 1, string-length() -1))) return $i - 48"/>`

The "calculation" variable first multiply all the digits in odd positions `[position() mod 2 = 1]`

with 3 and return the results as a sequence. All the digits in even positions are also returned. The new sequence of items are added together by the sum() function. At the end we do MOD 10.

`<xsl:variable name="calculation" select="sum((for $i in $digits[(position()) mod 2 = 1] return $i * 3, for $i in $digits[(position()) mod 2 = 0] return $i)) mod 10"/>`

In the test we first test if $calculation is 0. If it is, 0 is the recalculated checksum. If it is not 0, we subtract $calculation from 10 to get the recalculated checksum.

`<xsl:if test="(if ($calculation ne 0) then (10 - $calculation) else 0) ne $checksum">Not correct EAN-13. Could be a typo?</xsl:if>`

ISBN-13 is what we should use for books since 2007. The digits can be grouped with hyphens. The length of each group depends on length of country code, publisher code and book item code. Big publishers have a short publisher number and a long book item number.

In this tutorial we only consider ISBN numbers reduced to their digits. In the Schematron schema I have also added removal of "ISBN" and "-". It all depends on what exact format of ISBN-13 we have allowed in the first place.

When calculating the check digit it is important to note that for ISBN-13 the number is not reversed and it is the digits in even positions that are multiplied with 3.

- Remove the checksum.
- Add the digits in odd position together.
- Multiply the digits in even positions with 3 and add them together.
- Calculation = (odd sum + even sum) MOD 10.
- If calculation is 0 then 0 is the checksum.
- Else checksum = 10 - calculation.

- ISBN-13 number = 9780465067107
- Checksum = 7
- Number without checksum = 978046506710
- Odd positions = (9 + 8 + 4 + 5 + 6 + 1) = 33
- Even positions = (7*3 + 0*3 + 6*3 + 0*3 + 7*3 + 0*3) = 60
- Odd + even = 33 + 60 = 93
- Calculation = 93 mod 10 = 3
- Recalculated checksum =

if (calculation != 0) then (10 - calculation) else 0

In our example: 10 - (93 mod 10) = 7 - ISBN is valid if recalculated checksum (7) = checksum (7)

The variable named "checksum" extract the checksum to have it ready for the final test. Next we make a variable named "digits". We extract everything except the checksum with `substring(., 1, string-length() - 1)`

and convert the string of digits to a sequence of UNICODE numbers with `string-to-codepoints()`

. The UNICODE numbers are converted back to the original digits with "- 48".

`<xsl:variable name="checksum" select="number(substring(., string-length(), 1))"/>`

<xsl:variable name="digits" select="for $i in string-to-codepoints(substring(., 1, string-length() - 1)) return $i - 48"/>

The "calculation" variable first returns all digits in odd positions with `for $i in $digits[position() mod 2 = 1] return $i`

. Next we multiply the digits in even positions with 3. The `sum()`

function add it all together, and at the end we use modulus 10 to get the remainder.

`<xsl:variable name="calculation" select="sum((for $i in $digits[position() mod 2 = 1] return $i, for $i in $digits[position() mod 2 = 0] return $i * 3)) mod 10"/>`

If $calculation is not 0, we subtract it from 10 to get the recalculated checksum, else 0 is the recalculated checksum:

`<xsl:if test="(if ($calculation ne 0) then (10 - $calculation) else 0) ne $checksum">Not correct ISBN-13. Could be a typo?</xsl:if>`

ISBN-10 is the old way of making ISBN. It will be with us for many years, e.g. in used books. Since MOD 11 is used instead of MOD 10, the checksum can be 10. To represent 10 with one character, the Roman "x" is used.

The checksum calculation is very different from the rest of the examples in this tutorial not just because of MOD 11. After removing the check character, each digit of the number must be multiplied with their reversed position starting with 10.

- Remove the last character being the check character.
- The remaining digits are multiplied with the reversed position starting with 10.
- Calculation = (sum of multiplied digits) MOD 11.
- If calculation is 0 then 0 is the checksum.
- Else checksum = 11 - calculation.
- If the checksum is 10 then the Roman "x" is used instead.

- ISBN-10 number: 0764569104
- If checksum = 'x', then convert checksum to 10.

In our example: Checksum = 4 - Number without checksum = 076456910
- Multiply each digit with its reversed position starting with 10 and add the results together =

(0*10) + (7*9) + (6*8) + (4*7) + (5*6) + (6*5) + (9*4) + (1*3) + (0*2) =

10 + 63 + 48 + 28 + 30 + 30 + 36 + 3 + 0 = 238 - calculation = 238 MOD 11 = 7
- If $calculation = 0 then recalculated checksum = 0

In our example:

Recalculated checksum = 11 - 7 = 4. - ISBN is valid if recalculated checksum (4) = checksum (4).

We need two variables to have the checksum ready for the final test. In the first variable named "checksum" we extract the checksum value with `substring(., string-length(), 1)`

. In the next variable named "checksum10" we convert the "x", if found, to "10". We must also use the number() function to make the value into a number.

`<xsl:variable name="checksum" select="substring(., string-length(), 1)"/>`

<xsl:variable name="checksum10" select="number(if ($checksum eq 'x') then '10' else $checksum)"/>

In the variable named "digits", we extract the number without the test digit using `substring(., 1, string-length() - 1)`

and convert the string to a sequence of UNICODE numbers with `string-to-codepoints()`

. Then we convert the UNICODE numbers back to our original digits with "- 48".

`<xsl:variable name="digits" select="for $i in string-to-codepoints(substring(., string-length() - 1)) return $i - 48"/>`

In the "calculation" variable we multiply each item in the $digits sequence with "10" for the first item, "[1]", with "9" for the second item, "[2]", etc. Finally we sum all the results together. Remember to use a double set of parentheses when the `sum()`

function contains a sequence.

`<xsl:variable name="calculation" select="sum((for $i in $digits[1] return $i *10, for $i in $digits[2] return $i * 9, for $i in $digits[3] return $i * 8, for $i in $digits[4] return $i * 7, for $i in $digits[5] return $i * 6, for $i in $digits[6] return $i * 5, for $i in $digits[7] return $i * 4, for $i in $digits[8] return $i * 3, for $i in $digits[9] return $i * 2)) mod 11"/>`

And finally the test.

`<xsl:if test="(if ($calculation ne 0) then (11 - $calculation) else 0) ne $checksum10>Not correct ISBN-10. Could be a typo?</xsl:if>`

Most credit card numbers have 16 digits but American Express has only 15 digits and we also have other exceptions. The number usually has a prefix of digits to identify the card. Examples: VISA starts with "4", American Express with "37", Discover with "6011" and MasterCard with "5". [3]

We use VISA as example. For VISA cards the prefix is kept in the number when validated. This is as far as I know also true for MasterCard's, but I don't have reliable information for the rest of the credit cards.

For validation of credit cards numbers the **Luhn algorithm** is normally used. We could have compared a recalculated checksum with the checksum as in most of the other examples but in Luhn's algorithm the checksum is kept in the number. There are other differences from what we are used to.

- Reverse the number (don't remove the check digit at the end).
- Add the digits in odd position together.
- Multiply the digits in even positions with 2.
- If the results of the multiplication have 2 digits add the two digits together.
- Then add all the results together
- Calculation = (odd sum + even sum).
- If calculation ends with 0 the VISA number is valid.

- VISA number (I made it up) = 4561419020291774
- Reversed number = 4771920209141654
- Odd positions = 4 + 7 + 9 + 0 + 0 + 1 + 1 + 5 = 27
- Even positions =

(7*2 + 1*2 + 2*2 + 2*2 + 9*2 + 4*2 + 6*2+ 4*2) =

14 + 2 + 4 + 4 + 18 + 8 +12 + 8 =

(1+4) + 2 + 4 + 4 + (1+8) + 8 + (1+2) + 8 =

5 + 2 + 4 + 4 + 9 + 8 + 3 + 8 = 43 - Calculation = 27 + 43 = 60
- VISA number is valid if calculation ends with 0

It is a little strange that the above algorithm is considered the classical example of a MOD 10 calculation, when MOD 10 is not used! But it was used in the first place to calculate the checksum. Let us try it.

Remove the check digit and reverse the number. Everything is as above except that now the digits in the even positions are kept as they are. We multiply the odd positions with 2, etc. Now we end up with "if ((calculation MOD 10) != 0) then (10 - (calculation MOD 10)) else 0" is the checksum.

In the variable named "even", all the digits in the number are made into a sequence of UNICODE numbers with `string-to-codepoints()`

. The sequence is then reversed into a new sequence containing the numbers in even positions only, `[position() mod 2 = 0]`

. The numbers are made back to their original digits with "- 48", multiplied with 2, and returned.

`<xsl:variable name="even" select="for $i in reverse(string-to-codepoints(.))[position() mod 2 = 0] return ($i - 48) * 2"/>`

The variable named "odd" works like the "even" variable except that we now use the numbers in odd positions, `[position() mod 2 = 1]`

, and we don't multiply with 2.

`<xsl:variable name="odd" select="for $i in reverse(string-to-codepoints(.))[position() mod 2 = 1] return ($i - 48)"/>`

The variable named "calculation" is a little difficult to understand. For all items in $even we return both the "number mod 10" and the "number idiv 10". The "idiv" operator (integer division) is new in XPath 2.0. It does the opposite of "mod" returning not the remainder of the division but the integer.

This is what we need for $even, where some of the numbers can be greater than 9. In that case we get both the remainder and the integer returned as new items in the new sequence.

In the $odd variable all the numbers are less than 9 making both "mod" and "idiv" irrelevant. We just want the digit returned as it is. The `sum()`

function adds all the digits returned together.

`<xsl:variable name="calculation" select="sum((for $i in $even return ($i mod 10,$i idiv 10), for $i in $odd return ($i)))"/> `

If $calculation ends with a "0" the VISA number is valid. We can test it in several ways. I prefer "$calculation mod 10 ne 0".

`<xsl:if test="$calculation mod 10 ne 0">Not correct VISA number. Could be a typo?"</xsl:if>`

In our Schematron schema, as we are going to see in a minute, we merge the three variables containing the Luhn algorithm into one expression. Let us even include the test here as an exercise.

`<xsl:value-of select="if(sum(for $j in (for $i in reverse(string-to-codepoints(.))[position() mod 2 = 0] return ($i - 48) * 2, for $i in reverse(string-to-codepoints(.))[position() mod 2 = 1] return ($i - 48)) return ($j mod 10, $j idiv 10)) mod 10 ne 0) then 'Not valid' else 'Valid!'"/>`

Now we also use "mod 10" and "idiv 10" for the "odd" sequence. We still just want the digits in the "odd" sequence returned as they are. But "mod 10" returns a one digit number as it is and "idiv 10" returns a 0. The (digit + 0) is still the digit.

The future of young XPath is bright. I find it astonishing that a powerful XPath expression, containing both the Luhn algorithm and the test, is at the same time a value of an attribute in an XML document.

Checksum testing is only a guard against typos and scanning errors due to damaged bar codes or less than optimal scanning conditions. For typed input we could make additional tests to make the error messages more useful like: "is it a number", "length of number", "prefix digits", "required pattern", etc.

Additional tests are mostly easy coding, and they are not covered in this tutorial. In the Schematron schema I have added some additional tests to indicate that we would probably need more than just checksum validation if we test for real.

In our Schematron schema, checksum.sch, we have kept things simple. Basically we have just transferred our XPath expressions from XSLT to Schematron. We use `sch:assert`

elements only to give us testing similar to other schema languages. Nothing is reported if things are correct. We only want error messages or warnings if what we test are not as the schema says they should be.

`<?xml version="1.0"?>`

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">

<sch:pattern>

<sch:title>UPC</sch:title>

<sch:rule context="test/upc">

<sch:assert test="number()">UPC number can only have digits.</sch:assert>

<sch:assert test="string-length() eq 12">UPC number must have 12 digits.</sch:assert>

<sch:let name="checksum" value="number(substring(., string-length(.), 1))"/>

<sch:let name="digits" value="for $i in string-to-codepoints(substring(., 1, string-length(.) - 1)) return $i - 48"/>

<sch:let name="calculation" value="sum((for $i in $digits[(position()) mod 2 = 1] return $i*3, for $i in $digits[(position()) mod 2 = 0] return $i)) mod 10"/>

<sch:assert test="(if ($calculation ne 0) then (10 - $calculation) else 0) eq $checksum">UPC is not correct. Could be a typo?</sch:assert>

</sch:rule>

</sch:pattern>

<sch:pattern>

<sch:title>EAN-13</sch:title>

<sch:rule context="test/ean-13">

<sch:assert test="number()">EAN number can only have digits.</sch:assert>

<sch:assert test="string-length() eq 13">EAN-13 number must have 13 digits.</sch:assert>

<sch:let name="digits" value="for $i in reverse(string-to-codepoints(substring(., 1, string-length() -1))) return $i - 48"/>

<sch:let name="calculation" value="sum((for $i in $digits[(position()) mod 2 = 1] return $i *3, for $i in $digits[(position()) mod 2 = 0] return $i)) mod 10"/>

<sch:let name="checksum" value="number(substring(., string-length(), 1))"/>

<sch:assert test="(if ($calculation ne 0) then (10 - $calculation) else 0) eq $checksum">Not correct EAN-13. Could be a typo?</sch:assert>

</sch:rule>

</sch:pattern>

<sch:pattern>

<sch:title>ISBN-13</sch:title>

<sch:rule context="test/isbn-13">

<sch:assert test="matches(., '^ISBN (\d{3})-(\d+)-(\d+)-(\d+)-(\d)$')">Use this pattern: ISBN x-x-x-x-x</sch:assert>

<sch:assert test="string-length() eq 22">ISBN pattern must have 22 characters.</sch:assert>

<sch:let name="x" value="replace(., 'ISBN |-', '')"/>

<sch:let name="checksum" value="number(substring($x, string-length($x), 1))"/>

<sch:let name="digits" value="for $i in string-to-codepoints(substring($x, 1, string-length($x) -1)) return $i - 48"/>

<sch:let name="calculation" value="sum((for $i in $digits[position() mod 2 = 1] return $i, for $i in $digits[position() mod 2 = 0] return $i * 3)) mod 10"/>

<sch:assert test="(if ($calculation ne 0) then (10 - $calculation) else 0) eq $checksum">Not correct ISBN-13. Could be a typo?</sch:assert>

</sch:rule>

</sch:pattern>

<sch:pattern>

<sch:title>ISBN-10</sch:title>

<sch:rule context="test/isbn-10">

<sch:let name="num" value="replace(., 'ISBN ', '')"/>

<sch:assert test="string-length($num) = 10">Use 10 digits (or X for the last one).</sch:assert>

<sch:assert test="matches(., '^ISBN ')">Use this pattern: ISBN xxxxxxxxxx</sch:assert>

<sch:let name="digits" value="for $i in string-to-codepoints(substring($num, 1, string-length($num) - 1)) return $i - 48"/>

<sch:let name="calculation" value="sum((for $i in $digits[1] return $i *10, for $i in $digits[2] return $i * 9, for $i in $digits[3] return $i * 8, for $i in $digits[4] return $i * 7, for $i in $digits[5] return $i * 6, for $i in $digits[6] return $i * 5, for $i in $digits[7] return $i * 4, for $i in $digits[8] return $i * 3, for $i in $digits[9] return $i * 2)) mod 11"/>

<sch:let name="checksum" value="substring($digits, string-length($digits), 1)"/>

<sch:let name="checksum10" value="if ($checksum eq 'x') then '10' else $checksum"/>

<sch:assert test="(if ($calculation ne 0) then (11 - $calculation) else 0) eq number($checksum10)">Not correct ISBN-10. Could be a typo?</sch:assert>

</sch:rule>

</sch:pattern>

<sch:pattern>

<sch:title>VISA</sch:title>

<sch:rule context="test/visa">

<sch:assert test="number()">VISA number can only have digits.</sch:assert>

<sch:assert test="string-length() eq 16">VISA number must have 16 digits.</sch:assert>

<sch:assert test="substring(., 1, 1) eq '4'">VISA must start with "4".</sch:assert>

<sch:let name="calculation" value="sum(for $j in (for $i in reverse(string-to-codepoints(.))[position() mod 2 = 0] return ($i - 48) * 2, for $i in reverse(string-to-codepoints(.))[position() mod 2 = 1] return ($i - 48)) return ($j mod 10, $j idiv 10))"/>

<sch:assert test="$calculation mod 10 eq 0">Not correct VISA. Could be a typo?</sch:assert>

</sch:rule>

</sch:pattern>

</sch:schema>

How to validate our instance XML document, checksum.xml, using the Schematron schema, checksum.sch, is not in scope in this tutorial. Neither will we show how to use Schematron and other schema languages side by side, or how to nest Schematron inside other schemas.

Let us only recapitulate how a typical Schematron validation takes place. I have transformed `checksum.sch`

with `iso_svrl.xsl`

, importing `iso_schematron_skeleton.xsl, `

into a new XSLT stylesheet with some name. I have then used the new stylesheet to transform `checksum.xml`

into a SVRL report (Schematron Validation Report Language).

The `iso_schematron_skeleton.xsl`

is the "processor" of Schematron. The `iso_svrl.xsl`

is just to make the "processor" return a more useful new XSLT stylesheet that can transform the instance XML document (the document we want to validate) into a report. The two Schematron stylesheets can be downloaded from the Schematron website.

The SVRL report is just an XML document with SVRL markup and too technical for most humans to consume. I have made one more XSLT stylesheet to transform the SVRL report into XHTML. Yes, Schematron validation sounds and is crazy but also fun and easy as soon as you have some code in place to automate the transformations.

I have also tested Oxygen's wonderful Schematron support. All we need is to add a Processing Instructions (PI) to the XML instance document like this:

`<?xml version="1.0"?>`

<?oxygen SCHSchema="checksum.sch"?>

<test>

<upc>639382000393</upc>

<!-- et cetera -->

</test>

In Oxygen the above PI gives us Schematron validation just by clicking the same button (Ctrl+Shift+V) we use for DTD and XML Schema validation. Hope we will get similar Schematron support in XMLSpy, Stylus Studio and in other XML Editors. I propose to use a common PI like `<?schema schematron="URI"?>`

.

- [1]
If you are new to Schematron: the Schematron website.

- [2]
Wikipedia is a good place to start:

- [3]
We have a lot of dubious information on the web about credit cards, length of numbers, prefixes, if prefixes are to be included in the calculation or not, etc. If we really need to validate credit card numbers, we should contact each credit card organization we want to accept in our system for reliable and updated information.

**Updated 2009-08-06**