2.9.3 Bulk Loading a Collection

To load many documents, the best approach is to use some shell language (shell-script, awk, perl, python) to generate an XML file that contains all file names (and if you wish document names). e.g. a file /tmp/dir.xml:

<dir>
<doc path="/foo/bar/" name="doc0000001.xml">
.....
<doc path="/foo/bar/" name="doc2300000.xml">
</dir>

you can then efficiently import all these documents using an XQuery over the temporary file /tmp/dir.xml:

for $d in doc("/tmp/dir.xml")//doc
 return
  pf:add-doc(fn:concat($d/@path,$d/@name), fn:string($d/@name), "my-coll", 0)

With the above, all documents will be loaded into a single collection my-coll, that is read-only (because the pf:add-doc() last parameter, percentage=0).

Note that when you have many documents, grouping them in one (or a few) XML document collections reduces storage and query processing overhead (see Separate Documents vs Document Collections).