Creating a Custom Pipeline Step Using Python
You can create a custom Pipeline step using Python, which offers great flexibility in configuring and customizing the Pipeline.
To create a custom pipeline step using Python, you need to:
Create a Python file
Configure a Location with a new Pipeline Step
Creating a Python file
Create a Python file in the steps folder located in Voyager’s install location (i.e. C:/Voyager/server_1.9.7.3348/app/py/pipeline/steps)
Copy and paste the sample code below
Save the file
 import sys
import json
def run(entry):
"""
Sample Python pipeline step. Searches the text field for "Voyager" or
"voyager" and returns the word count.
:param entry: a JSON file containing a voyager entry.
"""
new_entry = json.load(open(entry, "rb"))
voyager_word_count = 0
if 'fields' in new_entry['entry']:
if 'text' in new_entry['entry']['fields']:
text_field = new_entry['entry']['fields']['text']
voyager_word_count += text_field.count('Voyager')
voyager_word_count += text_field.count('voyager')
new_entry['entry']['fields']['fi_voyager_word_count'] = voyager_word_count
sys.stdout.write(json.dumps(new_entry))
sys.stdout.flush() Â
Notes
Each Python pipeline step requires a main function named run.
The code shown above takes a JSON file as a required argument. This file is a Voyager entry containing fields such as path, format, title, name, text etc. The file is loaded into a Python dictionary and is parsed and processed searching for the words, Voyager or voyager. A new field named fl_voyager_word_count is added to a new entry and returned using stdout.
Each Python script in the steps folder is a pipeline step, so you need to add supporting files, such as utility files, in a different folder.
Configuring a Location and Adding a Pipeline Step
Open a location for editing, select the Pipeline tab and un-check Use Default Pipeline Configuration, as shown below:
Determine whether your pipeline step will be a first or later step and click Add.
Select a Python pipeline step as shown below. The list of Python steps is generated from the steps folder
Click Save.
Testing the Pipeline step
To test, create some sample text files in this location and identify some keyword you wish to search. Be sure to update the script with the words you will search for.
Build/Rebuild the index for that location. This will then run the Python step for each item being indexed.
When the indexing is finished, view the details of one of the items - there should be a new field. In this example, a new field named Voyager word count:
Debugging the Pipeline step
If the results are not as expected, the Python script can be debugged using the following steps:
Add the following lines of code to the top of theÂ
run(entry)Â function to make an entry file:
def run(entry):
"""
Sample Python pipeline step. Searches the text field for "Voyager" or
"voyager" and returns the word count.
:param entry: a JSON file containing a voyager entry.
"""
# FOR DEBUGGING ONLY - START
import shutil, os
if not os.path.exists(entry):
shutil.copyfile(entry, 'c:/temp/{0}'.format(os.path.basename(entry)))
### END
Save the script and re-build the index from within Voyager. This will create the entry file or files in the location you specified. It is recommended to only index a small set of data to create a small list of files that can be used to debug with.
Add a main function to the bottom of the script and call the run() function.
if __name__ == '__main__':
entry_file = "path to entry file here"
run(entry_file)
Â