2.9.2 Scalability

MonetDB/XQuery is quite scalable, when compared with other XQuery engines, being especially efficient in handling large (GBs) documents, by employing efficient join algorithms and advanced self-tuning indexing – both for structural (XPath traversals) and value-based queries (text and attribute values).

Still, there remain situations where scalability issues may appear. Here are a number of tips:

2.9.2.1 Making Sure Value Indices are used

MonetDB/XQuery automatically creates indices on all attribute and text node values, and these are used when expressions like:

(: accelerated by value index :)
<path1>[<path2>/text() = expr] 

for $x in <path1>
where $x/<path2>/text() = expr 
return $x

<path1>[<path2>/@attr = expr] 

for $x in <path1>
where $x/<path2>/@attr = expr 
return $x

This works regardless the type of expr; and expr may even be a loop-dependent expression (then we get a nested loop index join).

MonetDB/XQuery uses just-in-time query optimization based on sampling to determine whether the expression is selective enough to justify the use of an index.

warning: however, equality comparions on element nodes cannot be accelerated with these value indices:

(: not accelerated by value index :)
<path1>[<path2>/foo = expr] 

for $x in <path1>
where $x/<path2>/foo = expr 
return $x

The reason is that (barring a DTD or Schema knowledge – currently not exploited in MonetDB/XQuery) a comparison with the data value of an element, means that all descendant text node values have to be concatenated:

<foo>4<bar>2</bar></foo> = 42

evaluates to true! It is clear that this is hard to support with an index that stores the separate text values 4 and 2.

For this reason, it is advisable to use foo/text() = expr comparisons rather than foo = expr.

2.9.2.2 Use Large Main Memories

MonetDB is a fast main-memory oriented database, that uses column-wise storage. The query engine, however, is known to consume quite a bit of RAM, especially on queries that generate large intermediate results. Therefore, having more RAM in your computer may strongly improve MonetDB performance. As a general principle, best performance is ensured if you have at least the amount of RAM roughly equal to the size of the XML documents that your queries are accessing.

2.9.2.3 Use 64-bits OS and MonetDB/XQuery

In 32-bits systems, the usable amount of RAM is limited to 4GB, and on most OSs even to 3GB (Linux) or 2GB (Windows). So, if you, after reading the previous tip, decided to put 4GB of RAM into your 32-bits machine, MonetDB/XQuery will not be able to use it all.

On 32-bits Windows, our binary distribution of MonetDB/XQuery can use the full 3GB because it is "large address aware" (Windows terminology). However, you must first configure windows to allow use of the full 3GB by large-address-aware applications, otherwise MonetDB/XQuery will be limited to using 2GB.

The better way to go with large data sizes, is to switch to a 64-bits operating system. MonetDB/XQuery is fully supported on 64-bits operating systems, and even comes with a binary distribution for 64-bits Windows. And even if you use the 32-bits MonetDB/XQuery binary on a 64-bits OS, it gets access to the 4GB instead of just 2 or 3GB.

The default 64-bits MonetDB/XQuery binaries are built with 32-bits object identifiers (OIDs). This is a compile-time option (the 64-bits versions are configured with --enable-oid32). If your XML documents have more than 2 billion elements (typically, we are then talking about XML in the size range of more than 40GB) you will hit storage limits inside MonetDB, if this XML is stored in a single XML collection. Also, with --enable-oid32) string columns in MonetDB are limited to 4GB (i.e. all unique text nodes in a collection are stored in a single column). To lift those restrictions, you should configure MonetDB with --disable-oid32 and recompile.