Creating Pipelines

Voyager's Indexing Pipeline provides functions to transform and manipulate the properties (metadata) of data records as it adds them to the Index. When adding or editing a Repository, the pipeline section lets you choose an existing pipeline (if any) or create a new pipeline to apply to the Repository.

NOTE: A Repository can only be associated with one pipeline at any given time, though a given pipeline may be associated with multiple Repositories.

Creating a New Pipeline

You can assign different functions to different points in the Indexing process (Indexing has two phases: Scanning and Extraction - see this article for more information).

There are four places you can add Pipeline steps, shown in the following diagram:

 

  • Post-Scan - steps here are executed immediately after Scanning is complete

  • Pre-Extraction - steps here are executed just prior to Extraction

  • Post-Extraction - steps here are executed immediately after Extraction is complete

  • Pre-Index -steps here are executed just prior to the information being written to the Index

Notes

The Post-Scan and Pre-Index steps are always executed, while Pre-Extraction and Post-Extractionsteps may or may not be executed, depending on the settings for a particular Repository.

Adding a Pipeline Step

To  add one or more steps to a Pipeline:

On the Create Pipeline or Edit Pipeline page, click Add Pipeline Steps Here.

 

This displays the list of all available Pipeline Steps. You can add steps to more than one point in the Pipeline.

Click Save when you are done.

Pipeline Steps

You can add one or more of the following steps to different points in the Pipeline sequence:

  • S3 Blobs Upload
    Uploads thumbnails to Amazon S3

  • Calculate MD5 Checksum
    Calculates an MD5 checksum for the source file content

  • Copy Field
    Copies fields in a document

  • Rename Field
    Renames a field by moving it to a new field name

  • Remove Field
    Removes a field from a document

  • Transform Field Value
    Transforms a field value

  • Set Value for Field
    Sets a field to a specific value

  • Append Value to Field
    Appends a value to a field. If the field is empty, behaves like Set Value for Field

  • Extract Entities with NLP
    Uses Natural Language Processing to extract categorized entities from the text content

  • Create Thumbnail with Base Map
    Creates a Thumbnail with a specific base map that can be customized

  • GeoTag Standard
    Geotags the document content with the standard Gazetteer

  • Geotag Custom
    Geotags the document content with a custom Gazetteer

  • Calculate Centroid
    Useful for generating a cluster map of points when indexing data with polygonal geometry

  • Set Extent from PRJ File
    Transforms spatial extent based on projection information found in a component .prj file

  • Add meta XML tags
    Adds meta XML tags

  • Convert EXIF GPS data
    Uses EXIF GPS coordinates to assign a location and spatial representation to the record.