|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8.24 The Inner CoreThe innermost library of the MonetDB database system is formed by the library called GDK, an abbreviation of Goblin Database Kernel. Its development was originally rooted in the design of a pure active-object-oriented programming language, before development was shifted towards a re-usable database kernel engine. GDK is a C library that provides ACID properties on a DSM model , using main-memory database algorithms built on virtual-memory OS primitives and multi-threaded parallelism. Its implementation has undergone various changes over its decade of development, many of which were driven by external needs to obtain a robust and fast database system. The coding scheme explored in GDK has also laid a foundation to communicate over time experiences and to provide (hopefully) helpful advice near to the place where the code-reader needs it. Of course, over such a long time the documentation diverges from reality. Especially in areas where the environment of this package is being described. Consider such deviations as historic landmarks, e.g. crystallization of brave ideas and mistakes rectified at a later stage. 8.25 Short OutlineThe facilities provided in this implementation are:
The Binary Association Table (BAT) is the lowest level of storage considered in the Goblin runtime system . A BAT is a self-descriptive main-memory structure that represents the binary relationship between two atomic types. The association can be defined over:
This model can be used as a back-end model underlying other -higher level- models, in order to achieve better performance and data independence in one go. The relational model and the object-oriented model can be mapped on BATs by vertically splitting every table (or class) for each attribute. Each such a column is then stored in a BAT with type bat[oid,attribute], where the unique object identifiers link tuples in the different BATs. Relationship attributes in the object-oriented model hence are mapped to bat[oid,oid] tables, being equivalent to the concept of join indexes . The set of built-in types can be extended with user-defined types through an ADT interface. They are linked with the kernel to obtain an enhanced library, or they are dynamically loaded upon request. Types can be derived from other types. They represent something different than that from which they are derived, but their internal storage management is equal. This feature facilitates the work of extension programmers, by enabling reuse of implementation code, but is also used to keep the GDK code portable from 32-bits to 64-bits machines: the oid and ptr types are derived from int on 32-bits machines, but is derived from lng on 64 bits machines. This requires changes in only two lines of code each. To accelerate lookup and search in BATs, GDK supports one built-in search accelerator: hash tables. We choose an implementation efficient for main-memory: bucket chained hash . Alternatively, when the table is sorted, it will resort to merge-scan operations or binary lookups. BATs are built on the concept of heaps, which are large pieces of main memory. They can also consist of virtual memory, in case the working set exceeds main-memory. In this case, GDK supports operations that cluster the heaps of a BAT, in order to improve performance of its main-memory. 8.25.1 RationaleThe rationale for choosing a BAT as the building block for both relational and object-oriented system is based on the following observations:
A RDBMS or OODBMS based on BATs strongly depends on our ability to efficiently support tuples and to handle small joins, respectively. The remainder of this document describes the Goblin Database kernel implementation at greater detail. It is organized as follows:
8.26 Interface FilesIn this section we summarize the user interface to the GDK library. It consist of a header file (gdk.h) and an object library (gdklib.a), which implements the required functionality. The header file must be included in any program that uses the library. The library must be linked with such a program. 8.26.1 Database ContextThe MonetDB environment settings are collected in a configuration file. Amongst others it contains the location of the database directory. First, the database directory is closed for other servers running at the same time. Second, performance enhancements may take effect, such as locking the code into memory (if the OS permits) and preloading the data dictionary. An error at this stage normally lead to an abort. 8.26.2 GDK session handling
The session is bracketed by GDKinit and GDKexit. Initialization involves setting up the administration for database access, such as memory allocation for the database buffer pool. During the exit phase any pending transaction is aborted and the database is freed for access by other users. A zero is returned upon encountering an erroneous situation. 8.27 Binary Association TablesHaving gone to the previous preliminary definitions, we will now introduce the structure of Binary Association Tables (BATs) in detail. They are the basic storage unit on which GDK is modelled. The BAT holds an unlimited number of binary associations, called BUNs (Binary UNits). The two attributes of a BUN are called head (left) and tail (right) in the remainder of this document. The above figure shows what a BAT looks like. It consists of two columns, called head and tail, such that we have always binary tuples (BUNs). The overlooking structure is the BAT record. It points to a heap structure called the BUN heap. This heap contains the atomic values inside the two columns. If they are fixed-sized atoms, these atoms reside directly in the BUN heap. If they are variable-sized atoms (such as string or polygon), however, the columns has an extra heap for storing those (such variable-sized atom heaps are then referred to as Head Heaps and Tail Heaps). The BUN heap then contains integer byte-offsets (fixed-sized, of course) into a head- or tail-heap. The BUN heap contains a contiguous range of BUNs. It starts after the first pointer, and finishes at the end in the free area of the BUN. All BUNs after the inserted pointer have been added in the last transaction (and will be deleted on a transaction abort). All BUNs between the deleted pointer and the first have been deleted in this transaction (and will be reinserted at a transaction abort). The location of a certain BUN in a BAT may change between successive library routine invocations. Therefore, one should avoid keeping references into the BAT storage area for long periods. Passing values between the library routines and the enclosing C program is primarily through value pointers of type ptr. Pointers into the BAT storage area should only be used for retrieval. Direct updates of data stored in a BAT is forbidden. The user should adhere to the interface conventions to guarantee the integrity rules and to maintain the (hidden) auxiliary search structures. 8.27.1 GDK variant record typeWhen manipulating values, MonetDB puts them into value records. The built-in types have a direct entry in the union. Others should be represented as a pointer of memory in pval or as a string, which is basically the same. In such cases the len field indicates the size of this piece of memory. 8.27.2 The BAT recordThe elements of the BAT structure are introduced in the remainder. Instead of using the underlying types hidden beneath it, one should use a BAT type that is supposed to look like this: typedef struct {
// static BAT properties
bat batCacheid; // bat id: index in BBPcache
int batPersistence; // persistence mode
bit batCopiedtodisk; // BAT is saved on disk?
bit batSet; // all tuples in the BAT are unique?
// dynamic BAT properties
int batHeat; // heat of BAT in the BBP
sht batDirty; // BAT modified after last commit?
bit batDirtydesc; // BAT descriptor specific dirty flag
Heap* batBuns; // Heap where the buns are stored
// DELTA status
BUN batDeleted; // first deleted BUN
BUN batFirst; // empty BUN before the first alive BUN
BUN batInserted; // first inserted BUN
BUN batCount; // Tuple count
// Head properties
int htype; // Head type number
str hident; // name for head column
bit hkey; // head values should be unique?
bit hsorted; // are head values currently ordered?
bit hvarsized; // for speed: head type is varsized?
bit hnonil; // head has no nils
oid halign; // alignment OID for head.
// Head storage
int hloc; // byte-offset in BUN for head elements
Heap *hheap; // heap for varsized head values
Hash *hhash; // linear chained hash table on head
// Tail properties
int ttype; // Tail type number
str tident; // name for tail column
bit tkey; // tail values should be unique?
bit tnonil; // tail has no nils
bit tsorted; // are tail values currently ordered?
bit tvarsized; // for speed: tail type is varsized?
oid talign; // alignment OID for head.
// Tail storage
int tloc; // byte-offset in BUN for tail elements
Heap theap; // heap for varsized tail values
Hash thash; // linear chained hash table on tail
} BAT;
The internal structure of the BAT record is in fact much more complex, but GDK programmers should refrain of making use of that. The reason for this complex structure is to allow for a BAT to exist in two incarnations at the time: the normal view and the reversed view. Each bat b has a BATmirror(b) which has the negative cacheid of b in the BBP. Since we don't want to pay cost to keep both views in line with each other under BAT updates, we work with shared pieces of memory between the two views. An update to one will thus automatically update the other. In the same line, we allow synchronized BATs (BATs with identical head columns, and marked as such in the BAT Alignment interface) now to be clustered horizontally. 8.27.3 Heap ManagementHeaps are the low-level entities of mass storage in BATs. Currently, they can either be stored on disk, loaded into memory, or memory mapped.
These routines should be used to alloc free or extend heaps; they isolate you from the different ways heaps can be accessed. 8.27.4 Internal HEAP Chunk ManagementHeaps are used in BATs to store data for variable-size atoms. The implementor must manage malloc()/free() functionality for atoms in this heap. A standard implementation is provided here.
The heap space starts with a private space that is left untouched by the normal chunk allocation. You can use this private space e.g. to store the root of an rtree HEAP_malloc allocates a chunk of memory on the heap, and returns an index to it. HEAP_free frees a previously allocated chunk HEAP_private returns an integer index to private space. 8.27.5 BAT construction
The routine BATclone creates an empty BAT storage area with the properties inherited from its argument. 8.27.6 BUN manipulation
The association list does not contain holes. This density permits users to quickly access successive elements without the need to test the items for validity. Moreover, it simplifies transport to disk and other systems. The negative effect is that the user should be aware of the evolving nature of the sequence, which may require copying the BAT first. The update operations come in three flavors. Element-wise updates can use BUNins, BUNappend, BUNreplace, BUNdel, and BUNdelHead. The batch update operations are BATins, BATappend and BATdel. Only experts interested in speed may use BUNfastins, since it skips most consistency checks, does not update search accelerators, and does not maintain properties such as the hsorted and tsorted flags. Beware! The routine BUNfnd provides fast access to a single BUN providing a value for the head of the binary association. A very fast shortcut for BUNfnd if the selection type is known to be integer or OID, is provided in the form of the macro BUNfndOID. To select on a tail, one should use the reverse view obtained by BATmirror. The routines BUNhead and BUNtail return a pointer to the first and second value in an association, respectively. To guard against side effects on the BAT, one should normally copy this value into a scratch variable for further processing. Behind the interface we use several macros to access the BUN fixed part and the variable part. The BUN operators always require a BAT pointer and BUN identifier.
8.27.7 BAT properties
The function BATbuncount returns the space that is occupied in associations in the BAT. This is not the same as BATcount, since the first N associations may be unused or delta data. The BAT is given a new logical name using BATrename. The integrity properties to be maintained for the BAT are controlled separately. A key property indicates that duplicates in the association dimension are not permitted. The BAT is turned into a set of associations using BATset. Key and set properties are orthogonal integrity constraints. The strongest reduction is obtained by making the BAT a set with key restrictions on both dimensions. The persistency indicator tells the retention period of BATs. The system support three modes: PERSISTENT, TRANSIENT, and SESSION. The PERSISTENT BATs are automatically saved upon session boundary or transaction commit. TRANSIENT BATs are removed upon transaction boundary. SESSION BATs are removed at the end of a session. They are normally used to maintain temporary results. All BATs are initially TRANSIENT unless their mode is changed using the routine BATmode. The BAT properties may be changed at any time using BATkey, BATset, and BATmode. Valid BAT access properties can be set with BATsetaccess and BATgetaccess: BAT_READ, BAT_APPEND, and BAT_WRITE. BATs can be designated to be read-only. In this case some memory optimizations may be made (slice and fragment bats can point to stable subsets of a parent bat). A special mode is append-only. It is then allowed to insert BUNs at the end of the BAT, but not to modify anything that already was in there. 8.27.8 BAT manipulation
The routine BATmirror returns the mirror image BAT (where tail is head and head is tail) of that same BAT. This does not involve a state change in the BAT (as previously): both views on the BAT exist at the same time. 8.27.9 BAT Input/Output
A BAT created by BATnew is considered temporary until one calls the routine BATsave or BATmode. This routine reserves disk space and checks for name clashes in the BAT directory. It also makes the BAT persistent. The empty BAT is initially marked as ordered on both columns. Failure to read or write the BAT results in a NULL, otherwise it returns the BAT pointer. 8.27.10 Heap Storage ModesThe discriminative storage modes are memory-mapped, compressed, or loaded in memory. The BATmmap() changes the storage mode of each heap associated to a BAT. As can be seen in the bat record, each BAT has one BUN-heap (bn), and possibly two heaps (hh and th) for variable-sized atoms. The BATmadvise call works in the same way. Using the madvise() system call it issues buffer management advice to the OS kernel, as for the expected usage pattern of the memory in a heap. 8.27.11 Printing
The functions to convert BATs into ASCII and the reverse use internally defined formats. They are primarily meant for ease of debugging and to a lesser extent for output processing. Printing a BAT is done essentially by looping through its components, printing each association. If an index is available, it will be used. The BATmultiprintf command assumes a set of BATs with corresponding oid-s in the head columns. It performs the multijoin over them, and prints the multi-column result on the file. 8.27.12 BAT clustering
The above functions rearrange data in MonetDB heaps (used for storing BUNs var-sized atoms, or accelerators). Applying these clusterings will allow that MonetDB's main-memory oriented algorithms work efficiently also in a disk-oriented context. The BATsort functions return a copy of the input BAT, sorted in ascending order on the head column. BATordered starts a check on the head values to see if they are ordered. The result is returned and stored in the hsorted field of the BAT. BATorder is similar to BATsort, but sorts the BAT itself, rather than returning a copy (BEWARE: this operation destroys the delta information. TODO:fix). The BATrevert puts all the live BUNs of a BAT in reverse order. It just reverses the sequence, so this does not necessarily mean that they are sorted in reverse order! 8.28 BAT Buffer Pool
The remaining BBP tables contain status information to load, swap and migrate the BATs. The core table is BBPcache which contains a pointer to the BAT descriptor with its heaps. A zero entry means that the file resides on disk. Otherwise it has been read or mapped into memory. BATs loaded into memory are retained in a BAT buffer pool. They retain their position within the cache during their life cycle, which make indexing BATs a stable operation. Their descriptor can be obtained using BBPcacheid. The BBPindex routine checks if a BAT with a certain name is registered in the buffer pools. If so, it returns its BAT id. The BATdescriptor routine has a BAT id parameter, and returns a pointer to the corresponding BAT record (after incrementing the reference count). The BAT will be loaded into memory, if necessary. 8.29 GDK ExtensibilityGDK can be extended with new atoms, search accelerators and storage modes. 8.29.1 Atomic Type DescriptorsThe atomic types over which the binary associations are maintained are described by an atom descriptor.
8.29.2 Atom DefinitionUser defined atomic types can be added to a running system with the following interface:.
8.29.3 Atom Manipulation
These wrapper functions correspond closely to the interface functions one has to provide for a user-defined atom. They basically (with exception of ATOMput(), ATOMprint() and ATOMformat()) just have the atom id parameter prepended to them. 8.29.4 Unique OIDs
OIDs are special kinds of unsigned integers because the system guarantees uniqueness. For system simplicity and performance, OIDs are now represented as (signed) integers; however this is hidden in the system internals and shouldn't affect semantics. The OIDnew(N) claims a range of N contiguous unique, unused OIDs, and returns the starting value of this range. The highest OIDBITS designate site. [ DEPRECATED] 8.29.5 Built-in Accelerator Functions
The current BAT implementation supports one search accelerator: hashing. The routine BAThash makes sure that a hash accelerator on the head of the BAT exists. A zero is returned upon failure to create the supportive structures. The hash data structures are currently maintained during update operations. A BAT can be redistributed over n buckets using a hash function with BAThashsplit. The return value is a list of BAT pointers. Similarly, a range partitioning based is supported. 8.29.6 Multilevel Storage ModesWe should bring in the compressed mode as the first, maybe built-in, mode. We could than add for instance HTTP remote storage, SQL storage, and READONLY (cd-rom) storage. 8.30 GDK UtilitiesInterfaces for memory management, error handling, thread management and system information. 8.30.1 GDK memory management
Compiled with -DMEMLEAKS the GDK memory management log their activities, and are checked on inconsistent frees and memory leaks. 8.30.2 GDK error handling
The error handling mechanism is not sophisticated yet. Experience should show if this mechanism is sufficient. Most routines return a pointer with zero to indicate an error. The error messages are also copied to standard output unless GDKsilent is set to a non-zero value. The last error message is kept around in a global variable. Error messages can also be collected in a user-provided buffer, instead of being echoed to a stream. This is a thread-specific issue; you want to decide on the error mechanism on a thread-specific basis. This effect is established with GDKsetbuf. The memory (de)allocation of this buffer, that must at least be 1024 chars long, is entirely by the user. A pointer to this buffer is kept in the pseudo-variable GDKerrbuf. Normally, this is a NULL pointer. The GDKembedded variable is a property set in the configuration file to indicate that the kernel is only allowed to run as a single process. This can be used to remove all locking overhead. The actual state of affairs is maintained in GDKprotected, which is set when locking is required, e.g. when multiple threads become active. The kernel maintains a central table of all active threads. They are indexed by their tid. The structure contains information on the input/output file descriptors, which should be set before a database operation is started. It ensures that output is delivered to the proper client. The Thread structure should be ideally made directly accessible to each thread. This speeds up access to tid and file descriptors. 8.31 Transaction Management
MonetDB by default offers a global transaction environment. The global transaction involves all activities on all persistent BATs by all threads. Each global transaction ends with either TMabort or TMcommit, and immediately starts a new transaction. TMcommit implements atomic commit to disk on the collection of all persistent BATs. For all persistent BATs, the global commit also flushes the delta status for these BATs (see BATcommit/BATabort). This allows to perform TMabort quickly in memory (without re-reading all disk images from disk). The collection of which BATs is persistent is also part of the global transaction state. All BATs that where persistent at the last commit, but were made transient since then, are made persistent again by TMabort. In other words, BATs that are deleted, are only physically deleted at TMcommit time. Until that time, rollback (TMabort) is possible. Use of TMabort is currently NOT RECOMMENDED due to two bugs:
In effect, the problems with TMabort reduce the functionality of the global transaction mechanism to consistent checkpointing at each TMcommit. For many applications, consistent checkpointingis enough. Extension modules exist that provide fine grained locking (lock module) and Write Ahead Logging (sqlserver). Applications that need more fine-grained transactions, should build this on top of these extension primitives. TMsubcommit is intended to quickly add or remove BATs from the persistent set. In both cases, rollback is not necessary, such that the commit protocol can be accelerated. It comes down to writing a new BBP.dir. Its parameter is a BAT-of-BATs (in the tail); the persistence status of that BAT is committed. We assume here that the calling thread has exclusive access to these bats. An error is reported if you try to partially commit an already committed persistent BAT (it needs the rollback mechanism). 8.31.1 Delta Management
The BAT keeps track of updates with respect to a 'previous state'. Do not confuse 'previous state' with 'stable' or 'commited-on-disk', because these concepts are not always the same. In particular, they diverge when BATcommit, BATfakecommit, and BATundo are called explictly, bypassing the normal global TMcommit protocol (some applications need that flexibility). BATcommit make the current BAT state the new 'stable state'. This happens inside the global TMcommit on all persistent BATs previous to writing all bats to persistent storage using a BBPsync[?]. EXPERT USE ONLY: The routine BATfakeCommit updates the delta information on BATs and clears the dirty bit. This avoids any copying to disk. Expert usage only, as it bypasses the global commit protocol, and changes may be lost after quitting or crashing MonetDB. BATabort undo-s all changes since the previous state. The global TMabort[?] achieves a rollback to the previously committed state by doing BATabort on all persistent bats. BUG: after a failed TMcommit, TMabort does not do anything because TMcommit does the BATcommits before attempting to sync to disk instead of after doing this. The previous state can also be queried. BATprev is a view on the current BAT as it was in the previous state. BATalpha shows only the BUNs inserted since the previous state, and BATdelta the deleted buns. CAVEAT: BATprev, BATalpha and BATdelta only return views if the underlying BATs are read-only (often not the case when BATs are being updated). Otherwise, copies must be made anyway. 8.32 BAT Alignment and BAT views
The BATpropcheck examines a BAT and tries to set all applicable properties (key,sorted,align,dense). All algebraic BAT commands propagate the properties - including alignment properly on their results. VIEW BATs are BATs that lend their storage from a parent BAT. They are just a descriptor that points to the data in this parent BAT. A view is created with VIEWcreate. The cache id of the parent (if any) is returned by VIEWhparent and VIEWtparent (otherwise it returns 0). VIEW bats are read-only!! The VIEWcombine gives a view on a BAT that has two head columns of the parent. The VIEWhead constructs a BAT view that has the same head column as the parent, but has a void column with seqbase=nil in the tail. VIEWreset creates a normal BAT with the same contents as its view parameter (it converts void columns with seqbase!=nil to materialized oid columns). The BATmaterialize materializes a VIEW (TODO) or void bat inplace. This is useful as materialization is usually needed for updates. 8.33 BAT Iterators
The BATloop() looks like a function call, but is actually a macro. The following example gives an indication of how they are to be used: void
print_a_bat(BAT *b)
{
BATiter bi = bat_iterator(b);
BUN p, q;
BATloop(b, p, q)
printf("Element %3d has value %d\n",
*(int*) BUNhead(bi, p), *(int*) BUNtail(bi, p));
}
8.33.1 simple sequential scanThe first parameter is a BAT, the p and q are BUN pointers, where p is the iteration variable. */
#define BATloop(r, p, q)
for(q = BUNlast(r), p = BUNfirst(r);p < q; p++)
/*
8.33.2 batloop where the current element can be deleted/updatedNormally it is strictly forbidden to update the BAT over which is being iterated, or delete the current element. This can only be done with the specialized batloop below. When doing a delete, do not forget to update the current pointer with a p = BUNdelete(b,p) (the delete may modify the current pointer p). After the delete/update has taken place, the pointer p is in an inconsistent state till the next iteration of the batloop starts. */
#define BATloopDEL(r, p, q)
for(p = BUNfirst(r), q = BUNlast(r); p < q;
q = MIN(q,BUNlast(r)), p++)
/*
8.33.3 sequential scan over deleted BUNsStable BUNS that were deleted, are conserved to transaction end. You may inspect these data items. Again, the b is a BAT, p and q are BUNs, where p is the iteration variable. */
#define DELloop(b, p, q)
for (q = (b)->batFirst, p = (b)->batDeleted; p < q; p++)
/*
8.33.4 hash-table supported loop over BUNsThe first parameter `b' is a BAT, the second (`h') should point to `b->H->hash', and `v' a pointer to an atomic value (corresponding to the head column of `b'). The 'hb' is an integer index, pointing out the `hb'-th BUN. */
#define GDK_STREQ(l,r) (*(char*) (l) == *(char*) (r) && !strcmp(l,r))
#define HASHloop(bi, h, hb, v)
for (hb = h->hash[HASHprobe(h, v)]; hb != BUN_NONE; hb = h->link[hb])
if (ATOMcmp(h->type, v, BUNhead(bi, hb)) == 0)
#define HASHloop_str_hv(bi, h, hb, v)
for (hb = (h)->hash[((BUN *) (v))[-1]&(h)->mask]; hb != BUN_NONE; hb = (h)->link[hb])
if (GDK_STREQ(v, BUNhvar(bi, hb)))
#define HASHloop_str(bi, h, hb, v)
for (hb = (h)->hash[strHash(v)&(h)->mask]; hb != BUN_NONE; hb = (h)->link[hb])
if (GDK_STREQ(v, BUNhvar(bi, hb)))
/*
For string search, we can optimize if the string heap has eliminated all doubles. This is the case when not too many different strings are stored in the heap. You can check this with the macro strElimDoubles() If so, we can just compare integer index numbers instead of strings: */
#define HASHloop_fstr(bi, h, hb, idx, v)
for (hb = h->hash[strHash(v)&h->mask], idx = strLocate((bi.b)->H->vheap,v);
hb != BUN_NONE; hb = h->link[hb])
if (VarHeapValRaw((bi).b->H->heap.base, hb, (bi).b->H->width) == idx)
/*
The following example shows how the hashloop is used: void
print_books(BAT *author_books, str author)
{
BAT *b = author_books;
BUN i;
printf("%s\n==================\n", author);
HASHloop(b, (b)->H->hash, i, author)
printf("%s\n", ((str) BUNtail(b, i));
}
Note that for optimization purposes, we could have used a HASHloop_str instead, and also a BUNtvar instead of a BUNtail (since we know the tail-type of author_books is string, hence variable-sized). However, this would make the code less general. 8.33.5 specialized hashloopsHASHloops come in various flavors, from the general HASHloop, as above, to specialized versions (for speed) where the type is known (e.g. HASHloop_int), or the fact that the atom is fixed-sized (HASHlooploc) or variable-sized (HASHloopvar). */
#define HASHlooploc(bi, h, hb, v)
for (hb = h->hash[HASHprobe(h, v)]; hb != BUN_NONE; hb = h->link[hb])
if (ATOMcmp(h->type, v, BUNhloc(bi, hb)) == 0)
#define HASHloopvar(bi, h, hb, v)
for (hb = h->hash[HASHprobe(h, v)]; hb != BUN_NONE; hb = h->link[hb])
if (ATOMcmp(h->type, v, BUNhvar(bi, hb)) == 0)
/*
8.33.6 loop over a BAT with ordered tailHere we loop over a BAT with an ordered tail column (see for instance BATsort). Again, 'p' and 'q' are iteration variables, where 'p' points at the current BUN. 'tl' and 'th' are pointers to atom corresponding to the minimum (included) and maximum (included) bound in the selected range of BUNs. A nil-value means that there is no bound. The 's' finally is an integer denoting the bunsize, used for speed. */
#define SORTloop(b,p,q,tl,th)
if (!(BATtordered(b)&1)) GDKerror("SORTloop: BAT not sorted. n");
else for (p = (ATOMcmp((b)->ttype,tl,ATOMnilptr((b)->ttype))?
SORTfndfirst(b,tl):BUNfirst(b)),
q = (ATOMcmp((b)->ttype,th,ATOMnilptr((b)->ttype))?
SORTfndlast(b,th):BUNlast(b)); p < q; p++)
/* OIDDEPEND */
#if SIZEOF_OID == SIZEOF_INT
#define SORTfnd_oid(b,v) SORTfnd_int(b,v)
#define SORTfndfirst_oid(b,v) SORTfndfirst_int(b,v)
#define SORTfndlast_oid(b,v) SORTfndlast_int(b,v)
sortloop[?.10](oid,int,oid,simple,&oid_nil)
#else
#define SORTfnd_oid(b,v) SORTfnd_lng(b,v)
#define SORTfndfirst_oid(b,v) SORTfndfirst_lng(b,v)
#define SORTfndlast_oid(b,v) SORTfndlast_lng(b,v)
sortloop[?.10](oid,lng,oid,simple,&oid_nil)
#endif
#if SIZEOF_WRD == SIZEOF_INT
#define SORTfnd_wrd(b,v) SORTfnd_int(b,v)
#define SORTfndfirst_wrd(b,v) SORTfndfirst_int(b,v)
#define SORTfndlast_wrd(b,v) SORTfndlast_int(b,v)
sortloop[?.10](wrd,int,wrd,simple,&wrd_nil)
#else
#define SORTfnd_wrd(b,v) SORTfnd_lng(b,v)
#define SORTfndfirst_wrd(b,v) SORTfndfirst_lng(b,v)
#define SORTfndlast_wrd(b,v) SORTfndlast_lng(b,v)
sortloop[?.10](wrd,lng,wrd,simple,&wrd_nil)
#endif
#define SORTloop_bit(b,p,q,tl,th) SORTloop_chr(b,p,q,tl,th)
/*
8.34 Common BAT OperationsMuch used, but not necessarily kernel-operations on BATs. 8.34.1 BAT aggregates
The routine BAThistogram produces a new BAT with a frequency distribution of the tail of its operand. The routine BATsample returns a random sample on n BUNs of a BAT. For each BAT we maintain its dimensions as separately accessible properties. They can be used to improve query processing at higher levels. 8.34.2 Alignment transformationsSome classes of algebraic operators transform a sequence in an input BAT always in the same way in the output result. An example are the () function (including histogram(b), which is identical to (b.reverse)). That is to say, if synced(b2,b2) => synced((b1),(b2)) Another example is b.fetch(position-bat). If synced(b2,b2) and the same position-bat is fetched with, the results will again be synced. This can be mimicked by transforming the alignment-id of the input BAT with a one-way function onto the result. We use output->halign = NOID_AGGR(input->halign) for the output = (input) case, and output->align = NOID_MULT(input1->align,input2->halign) for the fetch. 8.34.3 BAT relational operators
The operation BATsort sorts the BAT on the header and produces a new BAT. A side effect is the clustering of the BAT store on the sort key. The BATjoin over R[A, B] and S[C, D] performs an equi-join over B and C. It results in a BAT over A and D. The BATouterjoin implements a left outerjoin over the BATs involved. The BATsemijoin over R[A, B] and S[C, D] produces the subset of R[A, B] that satisfies the semijoin over A and C. The full-materialization policy intermediate results in MonetDB means that a join can produce an arbitrarily large result and choke the system. The Data Distilleries tool therefore first computes the join result size before the actual join (better waste time than crash the server). To exploit that perfect result size knowledge, an result-size estimate parameter was added to all equi-join implementations. TODO: add this for semijoin/select/unique/diff/intersect The routine BATsunique considers both dimensions in the double elimination it performs; it produces a set. The routine BATtunique considers only the head column, and produces a unique head column. BATs that satisfy the set property can be further processed with the set operations BATsunion, BATsintersect, and BATsdiff. The same operations are also available in versions that only look at the head column:BATkunion, BATkdiff, and BATkintersect (which shares its implementation with BATsemijoin). */ |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| © 1994-2011 CWI | Contact us Legal HG web Bugs TestWeb | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||