Voyager's Indexing Process (Voyager Server)

The process of creating a searchable index has several phases - Indexing, where information is extracted from the data, and Enrichment, where the data and metadata can be refined before it is written into the index. When Indexing and Enrichment are complete, the index is ready to be searched, as shown by the diagram below:

 

Indexing

The indexing phase consists of two processes: Scanning and Extraction:

  • Scanning is done by Connectors, which crawl the repository and gather easily accessible information from each document, such as file name, size, last modification date etc. This process does not involve opening any files.

  • Extraction is done by Extractors which open data files and extract additional information from data crawled during the scanning phase. This might involve reading text from a Word document or reading metadata tags from an image file. Voyager chooses the extractor based on the format (mime type) of the data. An Extractor can often operate on multiple related file types, and a particular type of file may have more than one associated Extractor.  

Enriching

After the indexing phase is complete, there is an enrichment phase that occurs before Voyager writes the data into a searchable index. Enrichment is handled by the Pipeline, which, as its name suggests, is a series of steps that are ordered in a logical pipe, with each step performing a different function. Pipeline steps are customizable and new steps can be created using domain-specific logic and functions. Voyager includes a default set of pipeline steps that can be used as-is or as templates for further development.

Pipeline functions generally fall into three categories:

  • Metadata Extraction allows manipulating specific entries from a data set's associated XML metadata. For more information, see Metadata Extraction.

  • Document Transformers can be used to update data properties directly in the index. For more information, see Document Transformers.

  • Geotagging associates geographic location with non-spatial data. For more information, see Geotagging.

Searching

After Indexing and Enrichment are complete, users can search within the data using Voyager's powerful and flexible search tools. Search results are displayed in Navigo.