Natural Language Processing in Voyager

Natural Language Processing (NLP) is a way of using computer systems to try and interpret text information in the same way that a person would. At its core, NLP tries to understand human language naturally without needing exhaustive sets of processing rules, and can yield much richer results than a simple keyword search.

How NLP Searches Text

In a basic implementation, NLP uses content, context and syntax to identify text strings that fall into what are called Default Named Entities. The categories Voyager uses are listed below, as well as the corresponding field names.  Note that the special field nlp_place is the only field used for geotagging, while all fields can be used in refining search results.

Category Name

Field Name

Administrative Places

nlp_admin_places

Art

nlp_art 

Events 

nlp_events 

Facilities

nlp_facilities 

Geographic Places

nlp_geo_places 

Groups

nlp_groups 

Languages

nlp_languages 

Legal References

nlp_legal 

Companies and Organizations

nlp_orgs 

People

nlp_people 

Products

nlp_products 

NLP Place (used for geotagging)

nlp_place 


Viewing NLP Results

After you have enabled NLP and rescanned the  location, you can find NLP-derived fields on the Detail Page of a record, for example:

 

In Voyager, adding an NLP pipeline step greatly improves accuracy when identifying the most-relevant search results.  You can also use the results of NLP to improve geotagging results. Note that NLP is a language- and text-based process, so there will no benefit with datasets that do not contain text information.

Voyager's standard installation includes the NLP extension, but you will need to download a separate Python library before you can use it with your data. See Downloading and Installing the Natural Language Processing Python File for more information.

See Also