Selecting File Formats to Index

The Formats step allows you to configure different file types are handled when the Repository is indexed. Specifically, you can:

  • Include or exclude file types from Indexing and Extraction

  • Choose how Archive files (e.g. .gz, .zip, .rar or .tgz) are handled

  • Choose the method used to determine a file's MIME type

  • Configure additional custom exclusion filters to apply to the Repository

 

Including or Excluding File Types

You can include or exclude file types from Extraction or Indexing.

To include or exclude files:

  1. Enter a search term in the Search Formats box and choose a file type from the list. In the example below, the search term GIS brings up a list of file types with GIS in the name.

  2. Click Include or Exclude

 

 

NOTE: Files can be either included or excluded from Extraction or Indexing, but you cannot choose some files to include and some to exclude. That means when you select files to include in Indexing or Extraction, the option to Exclude files is grayed-out.

Indexing Archive Files

You can index the contents of Archive files, but keep in mind that archives must first be unpacked locally. Make sure that sufficient file space is available if multiple Archives are being indexed.

 

To Index the contents of Archive files:

  1. Click the button next to Index the content of archives

  2. Select the Archive file types to Index or click All

  3. Set an upper size limit for Archives with Skip Archives larger than

  4. To ensure that the unpacked files are removed after indexing, select Delete local files after Indexing

Detecting MIME type

 

Select how to identify the MIME type of files:

  • File Name - Uses only the file name to assess the MIME type of the file.

  • File Content - Uses the actual file contents to determine the MIME type. 
    NOTE: The File Content setting can potentially take much longer than File Name, depending on the size of the data set Voyager is indexing. 

Adding Custom Filters 

In this section, enter JSON to create rules that exclude files and folders from indexing based on their attributes. This is a different method of excluding files that uses the attributes of the files themselves instead of the MIME type.

Example 1

Excludes files that contain the exact string SOMETHING in the file name.

{ "type": "Folder", "file": { "type": "Name", "match": { "type": "Any", "vals": [ { "type": "Contains", "pattern": "SOMETHING" }, { "type": "Prefix", "pattern": "START_" }, { "type": "Suffix", "pattern": "_END" }, { "type": "Glob", "pattern": "*_GLOB_*" }, { "type": "Equals", "val": "EXACT_STRING" } ] } } }

Example 2

Excludes files of a specified range of sizes.

{ "type": "Folder", "file": { "type": "FileSize", "min": 1024, "max": 9223372036854775807 }, "folder": { "type": "MaxDepth", "depth": 6 } }