Best Practices for Voyager Deployments

Every implementation of Voyager will vary based on the goals of the project, the nature of the content, available resources and specific user needs. However, there are some general recommendations that we make that should make your specific project go more smoothly.

Reducing Indexing Time

In general, Voyager provides very speedy searches and can create an index quickly, however creating very large indexes may require hours or days to complete.  Even so, there are a few things that you can do to help speed up indexing.

Access to data

For the greatest efficiency, Voyager should have quick access to data it is indexing, which is not a problem when the data are on a local network attached storage device or a local machine. Indexing data from remote disks or across a slow network will require extra time.

Thumbnail generation

Voyager generates thumbnails as it indexes files. Depending on image compression,  thumbnails can range from 1 KB (for vector content) to 250 KB (for imagery) in size. By default, Voyager generates thumbnails on the fly and only when a user requests the data, which means that it does not create thumbnails for data that is never called.

If this setting slows down searching, users may choose to pre-generate thumbnails for all files. Keep in mind that doing so may require additional storage capacity. Whichever setting you choose, we recommend storing thumbnails on cheaper, ancillary storage. See voyager.vmoptions file for more configuration details.

Indexing Strategies

After you install Voyager, one of the first things you will do, is point Voyager at a repository of content. We recommend that you dip your toe into the pool, when you start indexing rather than jumping in with both feet. This will give you useful feedback immediately and let you discover issues before making large investments of time.

  1. Involve the people who know the content best
    We recommend that you survey or interview your key users to figure out what you are going to be indexing, where it lives, how often it is updated. There are several reasons for doing this:

    • While you might not use this information for your initial index tasks, it will inform your ultimate architecture. (see Collecting Repository Information for details.)

    • It will be useful to keep this team involved throughout the project life cycle to make sure that you are giving them back useful information in the search.

    • The content owners will either be supporters of your project or blockers of your project. If they are afraid that you aren’t going to be a good steward of the content or that you will expose it in ways that are inappropriate, your project will face an uphill battle. Better to invest time educating them on what you goals are and how their feedback will be incorporated.

  2. You won’t have all the answers before you start
    In many organizations, there is an institutional instinct to define the complete solution before you start. We have seen project teams attempt to define a schema for searching before any indexing is ever done, or to get a handle on the count of records types before indexing has started. Without fail these efforts are a waste of time. Ultimately, when these teams do the indexing, what they thought they have is not at all what they actually have, and the schema that they thought they needed is not adequate.

  3. Start Small
    Index a representative subset of the data. Too often users start by trying to index everything they have. It is far better to makes sure that a few things are working well before you invest in indexing everything, only to find out that it failed after hours or days of work.

  4. Get users feedback on your initial pass

    1. Find users who are familiar with the data look at it what was indexed

    2. Ask the users to assess the results based on the following:

      • Does that match what they think should be there? 

      • Did Voyager miss something? 

      • How does it look? 

      • Did thumbnails work?

      • Is Metadata showing up?

      • Are there any errors?

      • Are there changes that should be made in the indexing process to support filtering — specifically about the metadata extraction or document transformation steps in the pipeline

  5. Verify file access and permissions 
    When you start the indexing, you should be in communications with System Administrators to make sure that you actually have all the proper permissions to access those files. 

  6. Iterate the assessment
    Once you have a handle on whether the subset is good enough for beta, you can add more data. Then, repeat the assessment - you may or may not want to scan/re-index it. 

  7. Save some searches
    Create some saved searches for the users to make sure things work smoothly.
     

  8. Do Beta Testing
    Finally, let the beta users have at it and create a way to capture their feedback.