Introduction on the VOC data set

Exploring the wealth of functionality offered by MonetDB/SQL is best started using a toy database. For this we use the VOC database which provides a peephole view into the administrative system of an early multi-national company, the Vereenigde geoctrooieerde Oostindische Compagnie (VOC for short - The (Dutch) East Indian Company) established on March 20, 1602.

Acquiring and loading the VOC data set

The VOC data set can be downloaded from the MonetDB Assets page as a compressed file with SQL statements. After the file has been extracted, load its contents into MonetDB using mclient. On Windows you may use the mclient.exe file, located in the C:\Program Files\CWI\MonetDB5\bin.

Note: on Windows, the server by default only accepts connections originating from the local host. If you need other machines to access your database, change the configuration file by setting mapi_open=yes.

Before you load the VOC data set, it is advised to introduce a user different from the omnipresent default monetdb. The new user is given his own schema for the MonetDB database.
Assuming you have started MonetDB with SQL module, proceed as follows:

shell> mclient -lsql
sql>CREATE USER "voc" WITH PASSWORD 'voc' NAME 'VOC Explorer' SCHEMA "sys";
sql>CREATE SCHEMA "voc" AUTHORIZATION "voc";
sql>ALTER USER "voc" SET SCHEMA "voc";
sql>\q

To illustrate the use of the newly created account and schema, the following example connects and creates a table, which is dropped afterwards.

shell> mclient -lsql -uvoc
password: voc
sql>START TRANSACTION;
sql>CREATE TABLE test (
more>     id int,
more>     data varchar(30)
more> );
sql>\d
+------+
| name |
+======+
| test |
+------+
sql>\d test
CREATE TABLE "voc"."test" (
        "id" int,
        "data" varchar(30)
);
sql>ROLLBACK;

Importing the voc_dump.sql file into the database can be done using the textual client interface. Some alternative ways are as follows (Please note that the first one does not work on Windows):

1:
shell> mclient -lsql -uvoc < voc_dump.sql
Password:voc
2:
shell> mclient -lsql -uvoc
Password:voc
sql> \< voc_dump.sql
3:
shell> mclient -lsql -uvoc -Pvoc
sql> \< voc_dump.sql
4:
shell> mclient -lsql -uvoc -Pvoc < voc_dump.sql


Exploring the VOC data set

As mentioned before, the VOC data set contains data for around 8000 voyages.

sql>\d
+------------+
| name       |
+============+
| craftsmen  |
| impotenten |
| invoices   |
| passengers |
| seafarers  |
| soldiers   |
| total      |
| voyages    |
+------------+
sql>select count(*) from voyages;
+--------+
| count_ |
+========+
|   8115 |
+--------+

The set consists of 8 tables, which are all bound to each other using FOREIGN KEY relationships. The voyages table is considered to be the main table, which all others reference to. Every table, except invoices has a PRIMARY KEY defined over the columns number and number_sup. Since the invoices table holds zero or more invoices per voyage (identified by number, number_sub) a PRIMARY KEY constraint is not possible.

The tables craftsmen, impotenten, passengers, seafarers, and soldiers all share the same columns. We can define a VIEW that combines them all into one big table, to make them easier to access.

sql>CREATE VIEW onboard_people AS
more>SELECT * FROM (
more>SELECT 'craftsmen' AS type, craftsmen.* FROM craftsmen
more>UNION ALL
more>SELECT 'impotenten' AS type, impotenten.* FROM impotenten
more>UNION ALL
more>SELECT 'passengers' AS type, passengers.* FROM passengers
more>UNION ALL
more>SELECT 'seafarers' AS type, seafarers.* FROM seafarers
more>UNION ALL
more>SELECT 'soldiers' AS type, soldiers.* FROM soldiers
more>UNION ALL
more> SELECT 'total' AS type, total.* FROM total
more> ) AS onboard_people_table;
sql>

The new view will show up and we can just use it as a normal table, to for instance calculate the number of records for each group of people:

sql>\d
+----------------+
| name           |
+================+
| craftsmen      |
| impotenten     |
| invoices       |
| onboard_people |
| passengers     |
| seafarers      |
| soldiers       |
| total          |
| voyages        |
+----------------+
sql> SELECT type, COUNT(*) AS total 
more> FROM onboard_people GROUP BY type ORDER BY type;
+------------+-------+
| type       | total |
+============+=======+
| craftsmen  |  2349 |
| impotenten |   938 |
| passengers |  2813 |
| seafarers  |  4468 |
| soldiers   |  4177 |
| total      |  2454 |
+------------+-------+
sql> select count(*) from impotenten;
+--------+
| count_ |
+========+
|    938 |
+--------+

It is possible to play with the set in many ways, to find out several things that took place during the voyages of the ships, or the money that was earned. A few examples are shown below.

sql>SELECT COUNT(*) FROM voyages
more>WHERE particulars LIKE '%_recked%';
+--------+
| count_ |
+========+
|    358 |
+--------+
sql> SELECT chamber, CAST(AVG(invoice) AS integer) AS average
more> FROM invoices
more> WHERE invoice IS NOT NULL
more> GROUP BY chamber
more> ORDER BY average DESC;
+---------+---------+
| chamber | average |
+=========+=========+
| A       |  282996 |
| Z       |  259300 |
| H       |  150182 |
| R       |  149628 |
| D       |  149522 |
| E       |  149518 |
| null    |   83309 |
+---------+---------+
sql>CREATE VIEW extended_onboard AS
more>SELECT number, number_sup, trip, trip_sup,
more>    onboard_at_departure, death_at_cape,
more>    left_at_cape, onboard_at_cape,
more>    death_during_voyage, onboard_at_arrival,
more>    death_during_voyage - left_at_cape AS death_at_arrival
more>FROM onboard_people;