Jesper Tverskov, December 11, 2008

Testing XQuery Update Facility 1.0

With XQuery Update Facility we can update XML files using insert, replace, rename and delete with the same ease as when we update tables with SQL. In XSLT on the other hand, we can only update to a new file and must use the modified identity template.

At the time of this writing (2008), XQuery Update Facility 1.0 has reached the level of "Candidate Recommendation" that is the last level before becoming final standard. Some XQuery processors have already implemented XQuery Update Facility. In the following we will test it using Saxon SA 9.1.0.3 at the command line.

In XQuery the update facility replaces the original file with the updated. You will not get any warnings that such drastic action will take place, but Saxon saves a backup of the original file with the command line option –backup:on, and the -update:on option is also necessary.

1. Input file

For the tests, we will use the following XML file:

products.xml

<?xml version="1.0"?>
<products>
  <product id="p1">
    <name>Delta</name>
    <price>800</price>
    <stock>4</stock>
    <country>Denmark</country>
  </product>
  <product id="p2">
    <name>Golf</name>
    <price>1000</price>
    <stock>5</stock>
    <country>Germany</country>
  </product>
  <product id="p3">
    <name>Alfa</name>
    <price>1200</price>
    <stock>19</stock>
    <country>Germany</country>
  </product>
  <product id="p4">
    <name>Foxtrot</name>
    <price>1500</price>
    <stock>5</stock>
    <country>Australia</country>
  </product>
  <product id="p5">
    <name>Tango</name>
    <price>1225</price>
    <stock>3</stock>
    <country>Japan</country>
  </product>
</products>

2. Update - delete

delete node doc("products.xml")/products/product[@id = 'p1']

The above is all the XQuery needed to update products.xml, deleting the product element having an "id" attribute with the "p1" value.

3. Update - insert

insert node

<product id="p7">
  <name>papa</name>
  <price>2100</price>
  <stock>4</stock>
  <country>China</country>
</product>

before doc("products.xml")/products/product[1]

(: before/after or "as first/last into" :)

The above XQuery is all that is needed to insert a new product element with id="p7" in front of the first existing product element. Note, as the comment says, instead of "before" we can use "after" or "as first" or "last into".

4. Update - replace

replace value of node doc("products.xml")/products/product[2]/name with "Romeo"
,

replace node doc("products.xml")/products/product[3]/name with <NAME>test</NAME>(: Note that we can replace value or node :)(: Also note that we need a "," between the two updates. :)

Above we have two updates separated by a comma, in one XQuery, showing how to replace a value and how to replace an element node.

5. Update - rename

rename node doc("products.xml")/products/product[1] as "PRODUCT"

(: how to rename many nodes :)
(:
for $x in doc("products.xml")//(*|@*)
return
rename node $x
as upper-case(name($x))
:)

The first line rename the first product element to upper-case. Have a look at the out-commented code. That is how easy it would be to rename all element names and all attribute names to upper-case.

6. Update – transform

for $e in doc("products.xml")

return
  copy $je := $e
  modify delete node $je/products/product[3]

return $je

Note that we now only work on a copy of "products.xml", and we delete the third product element from that copy. The original document is left as it was. This is very similar to how the modified identity template in XSLT works.

When modifying a copy with "transform" we must use the -tree:linked option at the command line. I don't know why.

7. Beware of XDM

XDM is short for XQuery 1.0 and XPath 2.0 Data Model, which is the data model of XPath 2.0, XSLT 2.0, and XQuery, and any other specifications that reference it. This data model is odd from a common sense point of view: It doesn't know of the XML declaration and the DOCTYPE declaration. This is probably not a problem most of the time when XQuery Update Facility is implemented in databases, but if you use XQuery on your own, so to speak, you will run into irritating problems.

When you update an XML document the original XML declaration is lost or it will be replaced by the standard XML declaration. No problem if the original also used encoding="utf-8" but irritating if it used another encoding. Also if you update an XML document using a DOCTYPE declaration, like an XHTML document, the DOCTYPE declaration is simply deleted without your asking just by using XQuery Update Facility!

The above problems are exactly the same when using the identity template in XSLT. Also defaulted attributes are copied out of xsd schema or DTD. E.g.: if you use XQuery Update Facility on an XHTML document, all the anchor elements ("a") suddenly have an shape="rect" attribute. [1] CDATA sections are replaced by escaping and there are a number of other minor but at times very annoying problems. [2]

8. XQuery and XSLT

In XSLT we have the identity template with additional templates of modifications to update a copy of the file. That is we can not in XSLT update a file directly.

XQuery Update Facility is so powerful and easy to use, that some XSLT users would like to do something similar from inside XSLT as a supplement to the modified identity template. But that is only a dream. Saxon has en extension function called saxon:query() making it possible to use XQuery from inside XSLT. But not XQuery Update Facility. Among other things, there is no way to set the equivalent of the -update:on option at the command line.

Footnotes

[1]

See my tutorial The shape="rect" attribute in xhtml to xhtml. Saxon has a command line option –expand:(on|off), but as the documentation says, it does not work with all validators. The x:classname option can be used to load a validator of choice, but it is not easy to get it to work, as can be seen from the Saxon Help Discussion Forum.

[2]

My article, Identity transformation for XSLT 2.0, lists ten issues making an identity transformation in XSLT not complete. Exactly the same issues exist when using XQuery Update Facility.

  1. The XML declaration is not recreated.
  2. The DOCTYPE declaration is not recreated.
  3. Default attributes in DTD are added to the output.
  4. Whitespace in prolog are ignored outside comments and PIs.
  5. Leading whitespace is normalized for content in PIs in prolog.
  6. CDATA sections are replaced with their content escaped.
  7. Character entities are replaced.
  8. Whitespace is normalized in attribute values.
  9. The order of attributes are not always the same.
  10. Non significant whitespace is removed.

Updated 2008-12-11