To add a new index/search back-end to the Broker, you must define twelve routines for indexing, querying, and administering the Broker/indexer interface. These routines collectively define a standard for object indexing, index consistency maintenance, and querying. Depending on the system you are trying to integrate, some of these functions will likely be null calls, and much of the functionality might reside in a few of the other calls.
The code for this interface is in harvest/src/broker/index.c and harvest/src/broker/index.h. If you want to define a new indexing interface, create new index.[ch] files, and then add your routines to Indexer_Routines variable in main.c. You will also need to update the Indexer_Init routine in main.c. You can start with the skeleton files in src/broker/Skeleton/. If you create the routines for a system we do not currently support and are willing to provide those routines to us (possibly with copyright restrictions), please email harvest-dvl@cs.colorado.edu.
We discuss each of the routines below. More details about the Broker design and implementation are available in William Camargo's thesis [8]. The functions that define the indexing interface between the Broker and the indexer:
This Broker/indexer interface is designed to support both object-at-a-time (incremental) and batch (non-incremental) indexers. An indexing session begins with a call to IND_Index_Start, where you can call initialization routines for your indexer. For each update, the Collector (a part of the Broker) calls IND_New_Object or IND_Destroy_Object. A batch indexer should just queue the request, whereas an object-at-a-time indexer can use the call to update the object index. When a stream of updates is finished, the Collector calls IND_Flush to process any queued updates. Note that if the Broker fails before an index is flushed, updates may be lost. To overcome any inconsistency in the database, the Broker forces a garbage collection that removes and reindexes all objects. For more details about the Collector interface, see Section 5.9.
The Broker supports configuration through the broker.conf configuration file and the administrative interface. For more details about the administrative interface, see Section 5.5. An indexer can be configured through two routines:
The most complicated routines for most indexers are the query processing routines. Since most indexers have different query languages, the Broker translates a query into an intermediate form, which the Broker/indexer interface then translates into an indexer-specific query. The query results are analyzed and a list of UIDs is returned. The following routines define the query interface to the indexer: