System Sizing: Understanding and Tuning Voyager's Index, Meta Size, and Disk Use

Overview

Voyager's data discovery framework reads information from content repositories (files on disk, web services, database table etc.), and then builds an index of what it finds in those locations. Understanding how this information is stored, and the corresponding disk space that it requires to build a searchable index and associated files is essential to sizing large Voyager systems.

Indexing Data

The first thing to note is that Voyager is an indexed search system, not a content management system - Voyager will never make a copy of the data it catalogs, and because of this index is typically quite small compared to the actual data that it describes. 

Storage

During the indexing process, Voyager stores information about the records that it finds in two separate locations:

  1. The search engine index 

  2. A directory of supporting metadata files

Search Engine Index

Voyager uses Apache Lucene as it's core indexing framework. Apache is a high-performance, highly-scalable, fully-featured search engine. Lucene is used in tight conjunction with Apache Solr, which provides the low-level index structure with web-app functionality. Similar to a relational database system's tablespace, Voyager stores it's index as flat files on disk. To ensure maximal performance, the index files must be stored on a fast hard disk that is readily available to the operating system, and which shows little latency for reading and writing files. Voyager performs well on nearly any type of hardware, and is equally performant in virtualized environments. For best performance, it is recommended that Voyager run on modern hardware underpinned by the high disk I/O rates of solid state hard drives for index storage.

In a default Voyager installation, the index files are stored in <VoyagerInstallPath>\<VoyagerVersion>\data\indexV2\.

 

The core of the index is stored in the \v0 subdirectory, represented by a series of files with a variety of extensions. The purpose of these files types to the index, is beyond the scope of this article but if you're interested, there's more information here. Files in v0's sibling are part of the overall Lucene structure, and not specific to your index.

Supporting Metadata Files

A fast index isn't enough - We want it to look good too! Complementing the Search Engine Index, Voyager creates its own meta directory structure that contains, roughly for each record in the index, thumbnail and preview images, and (for certain content types) LYR and XML files.

In a default Voyager installation, the meta files are stored in <VoyagerInstallPath>\<VoyagerVersion>\data\meta\.

 

Meta contains a number of 1-character sub-directories (e.g. meta\a) each of which in turn contains a number of 3-character sub-directories (e.g. meta\a\0d9). The lower-level directories contain many files with 16-alphanumeric-character names, and a suffix giving some indication of what they are:

  • ~.layer.lyr is the Esri ArcMap layer file that allows Voyager users to open a search result in Esri ArcMap etc.,

  • ~.meta.xml is the Esri ArcGIS meta-data file residual from prior Esri geo-processing, and

  • ~.preview.jpg and ~.thumb.jpg are the records' preview and thumbnail images used to decorate search results in Voyager

 

This is just our way of organizing things. If you're wondering, the directory names come from the 1st (a) and 2nd to 4th (0d9) of the filenames, which come from an MD5 hash of the indexed record's properties (name, format, etc.), which we use as an ID to keep track of things. Every record in Voyager has one of these IDs - We don't expect you to use them (except for very specific / debugging searches).

 

Disk Space Required for Indexing

While not as much as the source data, the index itself, and the ancillary files do take space on disk. Naturally, this is compounded where Voyager catalogs larger collections of data - More spatial files equal more records in the index, and more meta directories and thumbnail (etc.) files. So, exactly how much space do we plan for?

In this exercise, we sampled a collection of 1000 spatial data sources, who's formats are representative of what many organizations will have, and indexed them in Voyager.

 

When we indexed these sources, we did so completely. This means that we created (150 x 150 pixel) thumbnail and (512 x 512 pixel) preview images for each record. We also retained the default option to Extract Linked Data from the datasources, so we'd pull local copies of the LYR and XML files from MXD and GDB layers where present.

 

 

When indexing was complete, we noted the following storage footprints in the \v0 and \meta directories.

 

 

All told, we're looking at approximately 70 Mb per 1000 index records, but let's round that up to 100 Mb per 1000 records to be conservative. For each record in the index, we need 0.1 Mb (100 kb) of space set aside. You'll notice that the search index requires much less storage than the meta files. 

Some Tips for Optimizing your Voyager Index (and Meta Files)

For very large and/or evolving search catalogs, there are a few things you can do to optimize the storage footprint and efficient retrieval of data from both your index(es) and meta files.

Index Size Management

Among your options for managing the size of your Voyager Search index, you can:

  1. Suppress the detailed index debugging information, and

  2. Consolidate the files that SOLR uses to store the index

Suppress the Detailed Index Debugging Information

Voyager allows you to collect detailed debugging information about how individual files are being included in the search index. While the detailed information might be useful during the initial phases of a data cataloging project (to confirm for a sub-set or records that document information is being extracted properly) or later on (to spot-check files that aren’t being indexed properly), we don’t recommend that this setting be enabled for larger production indexes. While it only amounts to a few kilobytes per record, this will compound over many cataloged documents and bloat the index's footprint on disk.

To disable this setting site-wide, navigate to Manage Voyager > Discovery > Discovery Configuration > Settings tab in the Default Settings section, and uncheck the option to Index Debug Information.

 

Alternatively, to disable this setting for a specific location, navigate to the Settings tab for an individual location’s settings and uncheck the option to Index Debug Information.

 

For more information about how to display and use the Index Debug Information, please refer to our documentation.

Consolidate the Files that Solr Uses to Store the Index

Over time, as the Voyager Search index gets updated (manually or automatically), updated index structures are built in new files linked to the core index (not appended to the existing core file(s)). New files can create redundant overlap in their content, and simply take up more space with basic bytes than fewer files will.

Voyager Administrators can use the Solr’s <optimize> requests to merge internal data structures to reduce the index’s storage footprint and improve search performance.

To optimize SOLR’s core index, navigate to Manage Voyager > Index > Manage Solr.

 

When the local Solr administrative dashboard opens, navigate to Core Admin > v0.

 

The Core information page will tell you if the index structures are optimized or not. Click on the Optimize button to Optimize manually.

The following screenshots show the contents and approximate sizes of the index directory before and after Optimization. Again, while the impacts of this operation might seem underwhelming on the smaller immature sample index, over a large, frequently changing index the effects are measurable.

Relocate Your Index to Faster Hardware

Faster disk I/O translates to faster index reads (scanning and searching) and writes (creation, updates, and optimization). For optimal performance, you can locate (or relocate) your index files on a solid state drive (SSD) or high speed (RPM) hard disk drive (HDD).

To relocate your index to a new location, independent of your Voyager Search web-server:

  1. Stop the Voyager Search service / system process

  2. Edit the file <VoyagerInstallPath>\server_<version>\app\Voyager.vmoptions

  3. Enter a new path for the Voyager index directory parameter (-Dindex.dir) - This can be an absolute path (to a mapped drive, a relative path (shown), or a UNC to a network share.

  4. Alternatively, you can enter a new path for the entire Voyager data directory parameter (index, meta, and logs etc.) (-Ddata.dir).

 

Metadata File Size Management

Among your options for managing the size of your Voyager meta file set and optimizing its delivery, you can:

  1. Suppress the extraction of linked data,

  2. Opt to generate meta images on-demand,

  3. Opt to generate meta images from a service, and

  4. Relocate the meta files to a remote network site.

Suppress the Extraction of Linked Data

The option to extract linked data instructs Voyager to extract local copies of the LYR and XML files from MXD and GDB layers where present. Opt out of this to avoid the extra storage cost needed to host these files in your local Voyager instance.

To disable this setting site-wide, navigate to Manage Voyager > Discovery > Discovery Configuration > Settings tab in the Default Settings section, and uncheck the option to Extract Linked Data.

Alternatively, to disable this setting for a specific location, navigate to the Settings tab for an individual location’s settings and uncheck the option to Extract Linked Data.

Opt to Generate Meta Images On-Demand

The option to create thumbnail and/or preview images instructs Voyager to create 150 x 150 pixel and 512 x 512 pixel JPGs from a large portion of the formats in the index. While most users will appreciate the visual context that the thumbnails and / or previews provide, creating each of these images pro-actively (at the time of index creation) can result in a large up-front storage cost. As well, the extra time needed to request, stream and save each cached image, adds to the duration of the indexing process.

Pro-active image caching was implemented in this exercise to estimate meta file sizes. Voyager’s default image strategy (and our recommendation) is to only create thumbnail and preview images when a record appears in the search results (for the first time). There might be a slight lag for the first user to retrieve image, but once it's been created, it will exist for all subsequent requests (unless explicitly cleared). A variety of options are available to Voyager Administrators to suppress or force creation of thumbnails or previews as befits their index storage allocation.

To specify the thumbnail/preview creation strategy setting site-wide, navigate to Manage Voyager > Discovery > Discovery Configuration > Thumbnails tab in the Default Settings section, and select the desired option.

Alternatively, to specify the thumbnail/preview creation strategy for a specific location, navigate to the Thumbnails tab for an individual location’s settings, and select the desired option.

Opt to Generate Meta Images from a Service

Instead of caching thumbnail and preview images locally at the expense of several kilobytes per index record, Voyager Administrators can opt to request these images in-line from remote tile map services (TMS). Normally these TMS will render the actual data described by the index, but since any valid TMS URL will suffice (provided it supports the records’ extent and SRS), it can provide context for the area in question without being related to the record.

To specify a remote TMS for thumbnail/preview creation site-wide, navigate to Manage Voyager > Discovery > Discovery Configuration > Thumbnails tab in the Default Settings section, and select the Draw Thumbnail from Webservice option and provide a suitable URL.

Alternatively, to specify a remote TMS for thumbnail/preview creation for a specific location, navigate to the Thumbnails tab for an individual location’s settings, and select the Draw Thumbnail from Webservice option and provide a suitable URL.

Relocate Meta Files to a Remote Network Site

Meta file retrieval will also benefit from hardware that supports faster output. For optimal performance, you can locate (or relocate) your meta files on a solid state drive (SSD) or high speed (RPM) hard disk drives (HDD).

To relocate your meta files to a new location, independent of your Voyager Search web-server:

  1. Stop the Voyager Search service / system process

  2. Edit the file <VoyagerInstallPath>\server_<version>\app\Voyager.vmoptions

  3. Enter a new path for the Voyager meta directory parameter (-Dmeta.dir) - This can be an absolute path (to a mapped drive), a relative path (shown), or a UNC to a network share.

  4. Alternatively, you can enter a new path for the entire Voyager data directory parameter (index, meta, and logs etc.) (-Ddata.dir).