Spellcheck Pipeline Step

The Spellcheck pipeline step automatically corrects misspelled words in a field that contains a single word or text. This pipeline step enables us to improve the Natural Language Processing (NLP)/Optical Character Recognition (OCR) capabilities.

The Spellcheck step can be added to a pipeline like any other pipeline step. The main input is the Source Field parameter — which is the field of a document that would be spellchecked. By default, the Source Field is simply overwritten by the corrected text. Another option might be to set the Destination Field parameter so that the corrected version is placed into the Destination Field and the Source Field remains unchanged

For the advanced user, it is possible to fully control what and how Spellcheck operates. The Spellcheck step relies on the Solr spellcheck functionality, which enables it to derive a dictionary from an arbitrary index and its field (see documentation of Solr spellcheck at Spell Checking | Apache Solr Reference Guide 6.6).

Various Solr-related spellchecking parameters are exposed by the step (as advanced parameters). These parameters are:

In order to test and fine tune the spellchecker, a Solr query can also be executed. There is a new Solr index named ‘Dictionary’ which contains English vocabulary. An example spellcheck query executed via the Solr admin UI using the Dictionary index is depicted below.