/
Spellcheck Pipeline Step

Spellcheck Pipeline Step

The Spellcheck pipeline step automatically corrects misspelled words in a field that contains a single word or text. This pipeline step enables us to improve the Natural Language Processing (NLP)/Optical Character Recognition (OCR) capabilities.

The Spellcheck step can be added to a pipeline like any other pipeline step. The main input is the Source Field parameter — which is the field of a document that would be spellchecked. By default, the Source Field is simply overwritten by the corrected text. Another option might be to set the Destination Field parameter so that the corrected version is placed into the Destination Field and the Source Field remains unchanged

For the advanced user, it is possible to fully control what and how Spellcheck operates. The Spellcheck step relies on the Solr spellcheck functionality, which enables it to derive a dictionary from an arbitrary index and its field (see documentation of Solr spellcheck at Spell Checking | Apache Solr Reference Guide 6.6).

Various Solr-related spellchecking parameters are exposed by the step (as advanced parameters). These parameters are:

  • Destination Field — Field to store the spellchecked result in; if left blank, the source field will be overwritten

  • Solr Index — Solr index containing the dictionary to spellcheck against

  • Request Handler — Solr request handler that handles the spellcheck query

  • Dictionary List — This is a specific Solr parameter, as there might be multiple dictionaries defined from an index

  • Accuracy — Accuracy threshold (value between 0 and 1) to be used by the spellchecking engine

  • External URL — External URL of Solr spellchecker to use

  • Spellcheck Parameters — any additional parameter used by Solr API as described at Spell Checking | Apache Solr Reference Guide 6.6

In order to test and fine tune the spellchecker, a Solr query can also be executed. There is a new Solr index named ‘Dictionary’ which contains English vocabulary. An example spellcheck query executed via the Solr admin UI using the Dictionary index is depicted below.

 

Related content