4.2.1 Document Management Functions

The pf:add-doc() function adds a new XML document available at some URI to the database, under a logical name (second parameter). It is also possible to provide as third parameter a collection name. This makes it possible to add a document to an existing document collection. All documents in a collection store all their data together in the same MonetDB tables. Especially in cases where you have may (thousands or more) of (presumably small) XML documents, it is advisable to store these together in one or a few collections, because storing a small document in a single collection (by the same name, which is the default behavior if only two parameters are provided to pf:add-doc()) will cause a lot of table-header and MonetDB meta data overhead, because each single document will lead to the creation of a couple of relational tables, such that a large XML collection may cause millions of them.

Normally, collections are created read-only, meaning that updates to them are prohibited and cause runtime errors. To allow updates, documents have to be shredded explicitly as updatable, by passing a fourth parameter to pf:add-doc(). This parameter must have a value between 1 and 99, that indicates the percentage of unallocated space that should be left per page, to accommodate future updates. All documents inside the same collection are either all updatable, or all read-only. Note that after a collection has been created by the first pf:add-doc(), its status cannot be changed anymore. There is a workaround, based on the backup/restore mechanism.

pf:add-doc ($uri as xs:string, $name as xs:string)
pf:add-doc ($uri as xs:string, $name as xs:string, $coll as xs:string)
pf:add-doc ($uri as xs:string, $name as xs:string, $coll as xs:string, $perc as xs:integer)
pf:del-doc ($name as xs:string)

A query that calls any of these functions, does not return a result, highly similar to the XQuery Update Facility. However, this family of MonetDB/XQuery extension functions is not considered the same as XQUF update queries. In fact, it is specifically forbidden to mix XQUF updates and document management commands in the same transaction.

We should note that MonetDB/XQuery, apart from atomicity with respect to document management (i.e. a document management query either fully succeeds or fully fails), also provides durability and some form of isolation. Isolation, however is not fully perfect.

It may happen that a read-only or update query that started before a document management query committed, ends up seeing its effects. That is, if execution of this concurrent query reaches execution of fn:doc(), it is evaluated with respect of the actual state of the database at that time. This is an aberration of snapshot isolation, which demands that fn:doc() be evaluated with respect to the database state at the *start* of the query.

On the other hand, once a query has gained access to a document, the query caches it in its database snapshot such that subsequent calls to fn:doc() will continue to find it, regardless whether it has been deleted since.