5.7.4 Motivation and Examples
We have found a surprising wide variety of XML data owners
to have region annotations:
- StandOff In Multimedia: XML that holds the output of video scene detection or speech recognition tools (etc.).
Used in various kinds of content-based multimedia search/browsing systems.
- StandOff In Forensic:
XML describing interesting features discovered on confiscated hard drives
(e.g. person names, addresses, emails, recovered file hierarchies, etc..).
The regions refer to the positions on disk where the features where found.
Used in computer-assisted crime scene investigations (CSI).
- StandOff In NLP:
XML describing the grammatical structure of natural texts.
Inline annotation cannot be used because natural language is ambiguous, and multiple parses
are often possible. Thus structure is separated from content, and refers to it by word position.
Used in automatic question answering systems.
- StandOff In Bio-Informatics:
XML storing DNA sequences annotated by genome research groups.
The regions refer by position in the DNA strands.
The annotations may contain clinical characteristics of patients or hold additional
bio-molecular data on those genes.
Used in collaborative genome research efforts.
If you have similar XML data and use MonetDB/XQuery to manage this, please
contact us on the mailing list.
For XQueries with such region overlap/containment conditions, other XML database systems
resort to query plans that have to compare all pairs of regions ("quadratic complexity").
On XML data sizes above a few hundred KB, this quickly systems become unusably slow.
In contrast, MonetDB/XQuery with StandOff extensions runs bio-informatics queries on gigabytes
of XML annotations within a few seconds.
|