Pipeline Stages and Indexing in HQ

Creating your index in HQ involves a two step process. The first step is Data Discovery. Data Discovery is when HQ crawls repositories to find and identify data available for indexing. The next step is Data Extraction. Data Extraction is when HQ opens files and extracts additional information from data crawled during the scanning phase. After this two step process, HQ allows users to configure pipeline steps to further enrich data before an index is created.

This customization of the Pipeline is useful because it gives data administrators in HQ fine-grained control over their index, improves efficiency of the index creation process, and conserves system resources.

The Pipeline can be configured during the following stages:

Post-Scan: The steps configured under this section are executed immediately after Scanning, during the first pass Data Discovery step. Certain repositories, like SDE, do not execute Extraction steps. In such cases, configuring pipeline steps at the Post-Scan level can be used to enrich data.

Pre-Extraction: The steps set up here are executed just prior to Data Extraction, right after Data Discovery.

Post-Extraction: The steps listed here are executed immediately after Data Extraction is complete.

Pre-Index: The steps added here are executed just prior to the information being written to the Index. This step occurs only after Data Discovery and Data Extractions are complete. For most data repositories, pipeline steps like metadata extraction, document transformation, and geotagging are configured here.