Creating a Custom Pipeline Step Using Python

You can create a custom Pipeline step using Python, which offers great flexibility in configuring and customizing the Pipeline.

To create a custom pipeline step using Python, you need to:

  1. Create a Python file

  2. Configure a Location with a new Pipeline Step

Creating a Python file

  1. Create a Python file in the steps folder located in Voyager’s install location (i.e. C:/Voyager/server_1.9.7.3348/app/py/pipeline/steps)

  2. Copy and paste the sample code below

  3. Save the file

 import sys import json def run(entry): """ Sample Python pipeline step. Searches the text field for "Voyager" or "voyager" and returns the word count. :param entry: a JSON file containing a voyager entry. """ new_entry = json.load(open(entry, "rb")) voyager_word_count = 0 if 'fields' in new_entry['entry']: if 'text' in new_entry['entry']['fields']: text_field = new_entry['entry']['fields']['text'] voyager_word_count += text_field.count('Voyager') voyager_word_count += text_field.count('voyager') new_entry['entry']['fields']['fi_voyager_word_count'] = voyager_word_count sys.stdout.write(json.dumps(new_entry)) sys.stdout.flush()  

Notes

  • Each Python pipeline step requires a main function named run.

  • The code shown above takes a JSON file as a required argument. This file is a Voyager entry containing fields such as path, format, title, name, text etc. The file is loaded into a Python dictionary and is parsed and processed searching for the words, Voyager or voyager. A new field named fl_voyager_word_count is added to a new entry and returned using stdout.

  • Each Python script in the steps folder is a pipeline step, so you need to add supporting files, such as utility files, in a different folder.

Configuring a Location and Adding a Pipeline Step

  • Open a location for editing, select the Pipeline tab and un-check Use Default Pipeline Configuration, as shown below:

  • Determine whether your pipeline step will be a first or later step and click Add.

  • Select a Python pipeline step as shown below. The list of Python steps is generated from the steps folder

  • Click Save.

Testing the Pipeline step

  • To test, create some sample text files in this location and identify some keyword you wish to search. Be sure to update the script with the words you will search for.

  • Build/Rebuild the index for that location. This will then run the Python step for each item being indexed.

  • When the indexing is finished, view the details of one of the items - there should be a new field. In this example, a new field named Voyager word count:

Debugging the Pipeline step

If the results are not as expected, the Python script can be debugged using the following steps:

  1. Add the following lines of code to the top of the 

run(entry) function to make an entry file:

def run(entry): """ Sample Python pipeline step. Searches the text field for "Voyager" or "voyager" and returns the word count. :param entry: a JSON file containing a voyager entry. """ # FOR DEBUGGING ONLY - START import shutil, os if not os.path.exists(entry): shutil.copyfile(entry, 'c:/temp/{0}'.format(os.path.basename(entry))) ### END
  1. Save the script and re-build the index from within Voyager. This will create the entry file or files in the location you specified. It is recommended to only index a small set of data to create a small list of files that can be used to debug with.

  1. Add a main function to the bottom of the script and call the run() function.

if __name__ == '__main__': entry_file = "path to entry file here" run(entry_file)