System and Software Requirements
There are three factors to take into account when determining system recommendations:
CPU - The number of processors (i.e. cores) on the machine
Memory - The RAM capacity of the machine
Disk - The amount of storage available to the machine
HQ
The system requirements for HQ are directly dependent on usage and configuration. An HQ instance contains a local Indexing Agent that can, on its own, perform indexing. If only the local Agent is used, requirements are the same as that of an Agent. If an HQ instance is not being used for indexing, system requirements can be relaxed.
CPU
Same as an Agent, but a minimum of 2 CPUs when using only remote Agents.
Memory
Same as Agent but 2-4 GB of RAM when using only remote Agents
Disk
Storage requirements for HQ are modest. Most of the data created by HQ is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.
Indexing Agent
In Vose, the Indexing Agents perform file indexing (although HQ also has the capacity to index data separately from Agents). The indexing process is highly CPU-dependent, so on machines running the Agent(s) the emphasis is primarily on CPU.
CPU
The Indexing Agent is designed to utilize all of the CPU capacity on a machine to maximize indexing throughput and performance, so in general, more CPUs are always better.
At minimum, a machine should have at least 2 cores dedicated to the Indexing Agent application (i.e. not doing other work on the system).
For high-throughput configurations, 8 cores is optimal.
Memory
On a machine running an Indexing Agent, indexing data such as Microsoft Office files requires it to process large amounts of text and other data in memory.
On a Java Virtual Machine running the Agent that is indexing very large amounts of text, 4-6 GB of RAM is recommended
On a machine that is not indexing massive amounts of text and doesn't require high throughput, 1-2 GB of RAM should suffice
Disk
Storage requirements for an Indexing Agent are modest. Most of the data created by the Agent is transient and will be deleted after indexing is complete. Systems that index larger numbers of files (resulting in large extraction queues) will have increased storage requirements.
Flex Index Node
For a Flex Index node, the emphasis is on memory and storage. In a flex deployment, the index is divided into subsets called Shards, which are distributed between multiple servers. While Flex Index storage requirements can be significant, they are not as extreme as for Voyager Server when it is running a local index.
CPU
CPU requirements are roughly the same as that of a Voyager Server running a local index.
Memory
Memory requirements are roughly the same as that of a Voyager Server running a local index.
Disk
Disk usage for an index is hard to estimate as it is highly dependant on a number of variables such as the type of data being indexed, file sizes, etc. The best way to determine disk requirements is to:
Index a small subset (1000 documents) of your data and note the disk usage.
Generously estimate the total number of documents in your “final” index and extrapolate out from the disk usage calculated in step 1.
Double or triple that number to allow for the index to grow.
Running out of disk space is a catastrophic event for a search index and can result in data loss. It’s important to estimate up front properly and routinely monitor disk usage.
One factor that drastically affects disk space requirements when the index contains a mix of data formats (Office, GIS, Imagery, etc) is whether or not the text field is stored by default. The important thing to note is that storing text can increase disk usage by orders of magnitude.
In a Vose installation, the meta folder (thumbnails, meta data) must also be considered when determining disk usage. While it’s recommended to estimate the size of the meta folder using the same methodology as the search index, in general the size of the meta folder is approximately around 16 MB per 1000 documents.
Voyager Server
CPU
CPU requirements are similar to those of an Agent
Memory
Solr indexing benefits from access to more memory, so RAM requirements can be high. A typical configuration is to have 16 GB of total RAM available on the system with 6 GB assigned to the Voyager Server jvm process. Voyager Server uses the remaining memory to store the index files.
Disk
Disk requirements for a Voyager Server are the same of that as a Flex Index Node. See the previous section for details.
The approximate size of the meta folder (thumbnails, meta data) is same for both settings: 16 MB per 1000 documents.
Vose Software Requirements
The table below describes software requirements for some repository connectors, extractors, pipeline steps and processing tasks in Vose: