Building and Using a Custom Gazetteer
Voyager has several methods of Geotagging non-spatial data.Â
This tutorial illustrates how to build a Custom Gazetteer using external georeferences and use it for geotagging data in Voyager. The gazetteer will be configured using two geographical references:
Countries of the world, from Natural Earth
States of the USA, from the United States Census Bureau
Creating the Gazetteer
The process of creating a gazetteer starts with downloading the relevant georeference material and using it to create Voyager locations
Downloading the Data
Both of the datasets are in zipped Shapefile format. After downloading them, unzip all of the archives and note the location of the files on the file system.
Creating Locations
The next step sets up locations for the downloaded data. To set up the locations:
Go to Manage Voyager > Discovery > Locations and click New Location
Select Vector Content
In the Vector Content dialog, Choose Shapefile from the dropdown menu
Select the url Connection Parameter
Enter the location of the shapefile that you downloaded, for example C:\Voyager\ne_50m_admin_0_countries.shp
Click Add
Choose ne_50m_admin_0_countries from the Select Layers to Index list
Change the meta_name field to name
Click Add
Rename the location to World Countries and click Save
Repeat these steps to create a location for the states shapefile and name the location US States. There should now be two locations that provide raw data for the gazetteer — World Countries and US States.
Creating the FST
After adding and configuring the locations, the next is to set up something called an FST (Finite State Transducer), which comes from Voyager’s underlying Solr platform. How FST works is beyond the scope of this document, except to say it they provides the ability to perform extremely efficient name matches, something crucial for a gazetteer.
To create the FST:
Go to Manage Voyager > Discovery > Document Transformers
Create a new configuration called gazetteer
Click Edit
This Document Transformer will copy the names from the data set up in the previous section into the FST. Once the FST has been populated, the gazetteer is ready to function.
Configuring the Pipeline to use the FST
The next step is to configure the locations set up in the previous section to use the FST transformer.
Go to the World Countries location and select the Pipeline tab
Change the transformer configuration to use the gazetteer transformer you created in the previous step.
Repeat the above steps for the US States location. Once the pipeline configurations have been set the last step is to index the two locations. To index the two geotagging locations:
Go to Manage Voyager > Discovery > Locations
Select Scan from the drop-down menu at the right for each location
Using the Custom Gazetteer to Geotag
To use the custom gazetteer, create a new location to be geotagged. The New York Times publishes a number of news feeds that should contain content matching the country and states entries in our gazetteer.
To set up the new location:
Go to Manage Voyager > Discovery > Locations
Click New Location
Select Feeds
Click Add
Set the url to http://localhost:8888/solr/v0
Enter fst_tag_name for the field
Click SaveÂ
After saving the Geotag configuration, return to the Locations page and select Scan from the drop-down menu at the right to index the new location
When indexing has completed, open the Navigo search results page
Filter results to the Feed location
Add a bounding box filter constraining results to the United States
The search should yield results corresponding to feed articles that have been geotagged against the gazetteer. Note that results will differ depending on the actual contents of the feed at the time of indexing.