Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Voyager's data discovery framework reads information from content repositories (files on disk, web services, database table etc.), and then builds an index of what it finds in those locations. Understanding how this information is stored, and the corresponding disk space that it requires to build a searchable index and associated files is essential to sizing large Voyager systems.

Indexing Data

The first thing to note is that Voyager is an indexed search system, not a content management system - Voyager will never make a copy of the data it catalogs, and because of this index is typically quite small compared to the actual data that it describes. 

Storage

During the indexing process, Voyager stores information about the records that it finds in two separate locations:

...

This is just our way of organizing things. If you're wondering, the directory names come from the 1st (a) and 2nd to 4th (0d9) of the filenames, which come from an MD5 hash of the indexed record's properties (name, format, etc.), which we use as an ID to keep track of things. Every record in Voyager has one of these IDs - We don't expect you to use them (except for very specific / debugging searches).

...

Disk Space Required for Indexing

While not as much as the source data, the index itself, and the ancillary files do take space on disk. Naturally, this is compounded where Voyager catalogs larger collections of data - More spatial files equal more records in the index, and more meta directories and thumbnail (etc.) files. So, exactly how much space do we plan for?

...

All told, we're looking at approximately 70 Mb per 1000 index records, but let's round that up to 100 Mb per 1000 records to be conservative. For each record in the index, we need 0.1 Mb (100 kb) of space set aside. You'll notice that the search index requires much less storage than the meta files. 

Some Tips for Optimizing your Voyager Index (and Meta Files)

For very large and/or evolving search catalogs, there are a few things you can do to optimize the storage footprint and efficient retrieval of data from both your index(es) and meta files.

Index Size Management

Among your options for managing the size of your Voyager Search index, you can:

  1. Suppress the detailed index debugging information, and

  2. Consolidate the files that SOLR uses to store the index

Suppress the Detailed Index Debugging Information

Voyager allows you to collect detailed debugging information about how individual files are being included in the search index. While the detailed information might be useful during the initial phases of a data cataloging project (to confirm for a sub-set or records that document information is being extracted properly) or later on (to spot-check files that aren’t being indexed properly), we don’t recommend that this setting be enabled for larger production indexes. While it only amounts to a few kilobytes per record, this will compound over many cataloged documents and bloat the index's footprint on disk.

...

For more information about how to display and use the Index Debug Information, please refer to our documentation.

Consolidate the Files that Solr Uses to Store the Index

Over time, as the Voyager Search index gets updated (manually or automatically), updated index structures are built in new files linked to the core index (not appended to the existing core file(s)). New files can create redundant overlap in their content, and simply take up more space with basic bytes than fewer files will.

...

The following screenshots show the contents and approximate sizes of the index directory before and after Optimization. Again, while the impacts of this operation might seem underwhelming on the smaller immature sample index, over a large, frequently changing index the effects are measurable.

Relocate Your Index to Faster Hardware

Faster disk I/O translates to faster index reads (scanning and searching) and writes (creation, updates, and optimization). For optimal performance, you can locate (or relocate) your index files on a solid state drive (SSD) or high speed (RPM) hard disk drive (HDD).

...

  1. Stop the Voyager Search service / system process

  2. Edit the file <VoyagerInstallPath>\server_<version>\app\Voyager.vmoptions

  3. Enter a new path for the Voyager index directory parameter (-Dindex.dir) - This can be an absolute path (to a mapped drive, a relative path (shown), or a UNC to a network share.

  4. Alternatively, you can enter a new path for the entire Voyager data directory parameter (index, meta, and logs etc.) (-Ddata.dir).

...

Metadata File Size Management

Among your options for managing the size of your Voyager meta file set and optimizing its delivery, you can:

  1. Suppress the extraction of linked data,

  2. Opt to generate meta images on-demand,

  3. Opt to generate meta images from a service, and

  4. Relocate the meta files to a remote network site.

Suppress the Extraction of Linked Data

The option to extract linked data instructs Voyager to extract local copies of the LYR and XML files from MXD and GDB layers where present. Opt out of this to avoid the extra storage cost needed to host these files in your local Voyager instance.

...

Alternatively, to disable this setting for a specific location, navigate to the Settings tab for an individual location’s settings and uncheck the option to Extract Linked Data.

Opt to Generate Meta Images On-Demand

The option to create thumbnail and/or preview images instructs Voyager to create 150 x 150 pixel and 512 x 512 pixel JPGs from a large portion of the formats in the index. While most users will appreciate the visual context that the thumbnails and / or previews provide, creating each of these images pro-actively (at the time of index creation) can result in a large up-front storage cost. As well, the extra time needed to request, stream and save each cached image, adds to the duration of the indexing process.

...

Alternatively, to specify the thumbnail/preview creation strategy for a specific location, navigate to the Thumbnails tab for an individual location’s settings, and select the desired option.

Opt to Generate Meta Images from a Service

Instead of caching thumbnail and preview images locally at the expense of several kilobytes per index record, Voyager Administrators can opt to request these images in-line from remote tile map services (TMS). Normally these TMS will render the actual data described by the index, but since any valid TMS URL will suffice (provided it supports the records’ extent and SRS), it can provide context for the area in question without being related to the record.

...

Alternatively, to specify a remote TMS for thumbnail/preview creation for a specific location, navigate to the Thumbnails tab for an individual location’s settings, and select the Draw Thumbnail from Webservice option and provide a suitable URL.

Relocate Meta Files to a Remote Network Site

Meta file retrieval will also benefit from hardware that supports faster output. For optimal performance, you can locate (or relocate) your meta files on a solid state drive (SSD) or high speed (RPM) hard disk drives (HDD).

...