Creating Pipelines
Voyager's Indexing Pipeline provides functions to transform and manipulate the properties (metadata) of data records as it adds them to the Index. When adding or editing a Repository, the pipeline section lets you choose an existing pipeline (if any) or create a new pipeline to apply to the Repository.
NOTE: A Repository can only be associated with one pipeline at any given time, though a given pipeline may be associated with multiple Repositories.
Creating a New Pipeline
You can assign different functions to different points in the Indexing process (Indexing has two phases: Scanning and Extraction - see this article for more information).
There are four places you can add Pipeline steps, shown in the following diagram:
Â
Post-Scan - steps here are executed immediately after Scanning is complete
Pre-Extraction - steps here are executed just prior to Extraction
Post-Extraction -Â steps here are executed immediately after Extraction is complete
Pre-Index -steps here are executed just prior to the information being written to the Index
Notes
The Post-Scan and Pre-Index steps are always executed, while Pre-Extraction and Post-Extractionsteps may or may not be executed, depending on the settings for a particular Repository.
Adding a Pipeline Step
To add one or more steps to a Pipeline:
On the Create Pipeline or Edit Pipeline page, click Add Pipeline Steps Here.
Â
This displays the list of all available Pipeline Steps. You can add steps to more than one point in the Pipeline.
Click Save when you are done.
Pipeline Steps
You can add one or more of the following steps to different points in the Pipeline sequence:
S3 Blobs Upload
Uploads thumbnails to Amazon S3Calculate MD5 Checksum
Calculates an MD5 checksum for the source file contentCopy Field
Copies fields in a documentRename Field
Renames a field by moving it to a new field nameRemove Field
Removes a field from a documentTransform Field Value
Transforms a field valueSet Value for Field
Sets a field to a specific valueAppend Value to Field
Appends a value to a field. If the field is empty, behaves like Set Value for FieldExtract Entities with NLP
Uses Natural Language Processing to extract categorized entities from the text contentCreate Thumbnail with Base Map
Creates a Thumbnail with a specific base map that can be customizedGeoTag Standard
Geotags the document content with the standard GazetteerGeotag Custom
Geotags the document content with a custom GazetteerCalculate Centroid
Useful for generating a cluster map of points when indexing data with polygonal geometrySet Extent from PRJ File
Transforms spatial extent based on projection information found in a component .prj fileAdd meta XML tags
Adds meta XML tagsConvert EXIF GPS data
Uses EXIF GPS coordinates to assign a location and spatial representation to the record.