Building and Using a Custom Gazetteer

Voyager has several methods of Geotagging non-spatial data. 

This tutorial illustrates how to build a Custom Gazetteer using external georeferences and use it for geotagging data in Voyager. The gazetteer will be configured using two geographical references:

Creating the Gazetteer

The process of creating a gazetteer starts with downloading the relevant georeference material and using it to create Voyager locations

Downloading the Data

Both of the datasets are in zipped Shapefile format. After downloading them, unzip all of the archives and note the location of the files on the file system.

Creating Locations

The next step sets up locations for the downloaded data. To set up the locations:

  • Go to Manage Voyager > Discovery > Locations and click New Location

  • Select Vector Content

  • In the Vector Content dialog, Choose Shapefile from the dropdown menu

  • Select the url Connection Parameter

  • Enter the location of the shapefile that you downloaded, for example C:\Voyager\ne_50m_admin_0_countries.shp

  • Click Add

  • Choose ne_50m_admin_0_countries from the Select Layers to Index list

  • Change the meta_name field to name

  • Click Add

  • Rename the location to World Countries and click Save

Repeat these steps to create a location for the states shapefile and name the location US States. There should now be two locations that provide raw data for the gazetteer — World Countries and US States.

Creating the FST

After adding and configuring the locations, the next is to set up something called an FST (Finite State Transducer), which comes from Voyager’s underlying Solr platform. How FST works is beyond the scope of this document, except to say it they provides the ability to perform extremely efficient name matches, something crucial for a gazetteer.

To create the FST:

  • Go to Manage Voyager > Discovery > Document Transformers

  • Create a new configuration called gazetteer

  • Click Edit

This Document Transformer will copy the names from the data set up in the previous section into the FST. Once the FST has been populated, the gazetteer is ready to function.

Configuring the Pipeline to use the FST

The next step is to configure the locations set up in the previous section to use the FST transformer.

  • Go to the World Countries location and select the Pipeline tab

  • Change the transformer configuration to use the gazetteer transformer you created in the previous step.

Repeat the above steps for the US States location. Once the pipeline configurations have been set the last step is to index the two locations. To index the two geotagging locations:

  1. Go to Manage Voyager > Discovery > Locations

  2. Select Scan from the drop-down menu at the right for each location

Using the Custom Gazetteer to Geotag

To use the custom gazetteer, create a new location to be geotagged. The New York Times publishes a number of news feeds that should contain content matching the country and states entries in our gazetteer.

To set up the new location:

  1. Go to Manage Voyager > Discovery > Locations

  2. Click New Location

  3. Select Feeds

  4. Enter http://www.nytimes.com/services/xml/rss/nyt/US.xml

  5. Click Add

  6. Set the url to http://localhost:8888/solr/v0

  7. Enter fst_tag_name for the field

  8. Click Save 

After saving the Geotag configuration, return to the Locations page and select Scan from the drop-down menu at the right to index the new location

  1. When indexing has completed, open the Navigo search results page

  2. Filter results to the Feed location

  3. Add a bounding box filter constraining results to the United States

The search should yield results corresponding to feed articles that have been geotagged against the gazetteer. Note that results will differ depending on the actual contents of the feed at the time of indexing.