System and Software Requirements

There are three factors to take into account when determining system recommendations:

  • CPU - The number of processors (i.e. cores) on the machine

  • Memory - The RAM capacity of the machine

  • Disk - The amount of storage available to the machine

HQ

The system requirements for HQ are directly dependent on usage and configuration. An HQ instance contains a local Indexing Agent that can, on its own, perform indexing. If only the local Agent is used, requirements are the same as that of an Agent. If an HQ instance is not being used for indexing, system requirements can be relaxed.

CPU

Same as an Agent, but a minimum of 2 CPUs when using only remote Agents.

Memory

Same as Agent but 2-4 GB of RAM when using only remote Agents

Disk

Storage requirements for HQ are modest. Most of the data created by HQ is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.

Indexing Agent

In Vose, the Indexing Agents perform file indexing (although HQ also has the capacity to index data separately from Agents). The indexing process is highly CPU-dependent, so on machines running the Agent(s) the emphasis is primarily on CPU.

CPU

The Indexing Agent is designed to utilize all of the CPU capacity on a machine to maximize indexing throughput and performance, so in general, more CPUs are always better.

  • At minimum, a machine should have at least 2 cores dedicated to the Indexing Agent application (i.e. not doing other work on the system).

  • For high-throughput configurations, 8 cores is optimal.

Memory

On a machine running an Indexing Agent, indexing data such as Microsoft Office files requires it to process large amounts of text and other data in memory.

  • On a Java Virtual Machine running the Agent that is indexing very large amounts of text, 4-6 GB of RAM is recommended

  • On a machine that is not indexing massive amounts of text and doesn't require high throughput, 1-2 GB of RAM should suffice

Disk

Storage requirements for an Indexing Agent are modest. Most of the data created by the Agent is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements. 

Flex Index Node

For a Flex Index node, the emphasis is on memory and storage. In a flex deployment, the index is divided into subsets called Shards, which are distributed between multiple servers. While Flex Index storage requirements can be significant, they are not as extreme as for Voyager Server when it is running a local index.

CPU

CPU requirements are roughly the same as that of a Voyager Server running a local index.

Memory

Memory requirements are roughly the same as that of a Voyager Server running a local index.

Disk

Disk usage for an index is hard to estimate as it is highly dependant on a number of variables such as the type of data being indexed, file sizes, etc. The best way to determine disk requirements is to:

  1. Index a small subset (1000 documents) of your data and note the disk usage.

  2. Generously estimate the total number of documents in your “final” index and extrapolate out from the disk usage calculated in step 1.

  3. Double or triple that number to allow for the index to grow.

Running out of disk space is a catastrophic event for a search index and can result in data loss. It’s important to estimate up front properly and routinely monitor disk usage.

One factor that drastically affects disk space requirements when the index contains  a mix of data formats (Office, GIS, Imagery, etc) is whether or not the text field is stored by default. The important thing to note is that storing text can increase disk usage by orders of magnitude.

In a Vose installation, the meta folder (thumbnails, meta data) must also be considered when determining disk usage. While it’s recommended to estimate the size of the meta folder using the same methodology as the search index, in general the size of the meta folder is approximately around 16 MB per 1000 documents.

Voyager Server

CPU

CPU requirements are similar to those of an Agent

Memory

Solr indexing benefits from access to more memory, so RAM requirements can be high. A typical configuration is to have 16 GB of total RAM available on the system with 6 GB assigned to the Voyager Server jvm process. Voyager Server uses the remaining memory to store the index files.

Disk

Disk requirements for a Voyager Server are the same of that as a Flex Index Node. See the previous section for details.

The approximate size of the meta folder (thumbnails, meta data) is same for both settings: 16 MB per 1000 documents.

Vose Software Requirements

The table below describes software requirements for some repository connectors, extractors, pipeline steps and processing tasks in Vose: