Vose System Requirements

System Requirements

These requirements are intended as a starting point, not an ending point in the discussion about your specific solution. The requirements will vary depending on what is being indexed, where the content is stored, how many users you expect and how often the data are updated, among other considerations. These requirements are an attempt to provided minimum baseline, but the particulars of your system and your usage should be discussed with a Voyager Solution Architect. 

There are three elements that make up the system recommendations below:

  • CPU - The number of processors (ie cores) on the machine

  • Memory - The RAM capacity of the machine

  • Disk - The amount of storage available to the machine

Vose

HQ

The system requirements for Voyager HQ are directly dependent on usage and configuration. An HQ instance contains a local Agent that can perform indexing, and if only the local Agent is used, requirements are the same as that of Voyager Agent.

  • CPU: Same as Agent but a minimum of 2 CPUs when using only remote Agents.

  • Memory: Same as Agent but 2-4 GB of RAM when using only remote Agents

  • Disk: Storage requirements for HQ are modest. Most of the data created by HQ is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.

Agent

In Vose, Voyager Agent performs file indexing (although HQ also has the capacity to index data separately from Agents). The indexing process is highly CPU-dependent, so on machines running Agent the emphasis is primarily on CPU.

  • CPU: Agent is designed to utilize all of the CPU capacity on a machine to maximize indexing throughput and performance, so in general more CPUs are always better.

    • At minimum, a machine should have at least 2 cores dedicated to the Agent application (ie. not doing other work on the system).

    • For high-throughput configurations, 8 cores would be optimal.

  • Memory: On a machine running Agent, indexing data such as Microsoft Office files requires it to process large amounts of text and other data in memory.

    • On a Java Virtual Machine running Agent that is indexing very large amounts of text, 4-6 GB of RAM is recommended

    • On a machine that is not indexing massive amounts of text and doesn't require high throughput, 1-2 GB of RAM should suffice

  • Disk: Storage requirements for Agent are modest. Most of the data created by Agent is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.

Flex Index

For a Flex Index node, the emphasis is on memory and storage. In a flex deployment, the index is divided into subsets called Shards which are distributed between multiple servers. While Flex Index storage requirements can be significant, they are not as extreme as for Voyager Server when it is running a local index.

  • CPU: CPU requirements are roughly the same as that of a Voyager Server running a local index.

  • Memory: Memory requirements are roughly the same as that of a Voyager Server running a local index.

  • Disk: The approximate size of the index containing a mix of data formats (Office, GIS, Imagery, etc) depends on whether or not the Text field is stored by default.

    • If the Text field is stored: 12 MB per 1000 documents

    • If the Text field is NOT stored: 5 MB per 1000 documents

The approximate size of the meta folder (thumbnails, meta data) is same for both settings: 16 MB per 1000 documents

In a Flex Index deployment, where the index is split up between multiple servers, a conservative of estimate storage requirements would be to divide the estimated size by the number of shards in the flex index.

Voyager Server 

  • CPU: CPU requirements are similar to those of Agent

  • Memory: Solr indexing benefits from access to more memory, so RAM requirements can be high. A typical configuration is to have 16 GB of total RAM available on the system with 6 GB assigned to the Voyager Server jvm process. Voyager Server uses the remaining memory to store the index files.

  • Disk: Voyager Server can run a Solr search in either local mode or flex mode  If it is running a local index, storage requirements are significantly increased in order to accommodate storing the entire index. The specific size requirement for an index depends on the number and type of documents indexed as well as other factors such as the amount of text stored in each document.

Having sufficient available storage is critical for a Solr index, as running out of disk space will result in catastrophic failure. It is always best to overestimate storage requirements and to be able to add more storage easily if and when demand increases. The approximate size of the index containing a mix of data formats (Office, GIS, Imagery, etc) depends on whether or not the Text field is stored by default. In Voyager 2.0 and beyond, the fulltext is stored by default.

  • If the Text field is stored: 12 MB per 1000 documents

  • If the Text field is NOT stored: 5 MB per 1000 documents

The approximate size of the meta folder (thumbnails, meta data) is same for both settings: 16 MB per 1000 documents

Other Software Requirements

See Vose Software Requirements for a list of software Vose requires for some repository connectors, extractors, pipeline steps and processing tasks.