Selecting File Formats to Index
The Formats step allows you to configure different file types are handled when the Repository is indexed. Specifically, you can:
Include or exclude file types from Indexing and Extraction
Choose how Archive files (e.g. .gz, .zip, .rar or .tgz) are handled
Choose the method used to determine a file's MIME type
Configure additional custom exclusion filters to apply to the Repository
Â
Including or Excluding File Types
You can include or exclude file types from Extraction or Indexing.
To include or exclude files:
Enter a search term in the Search Formats box and choose a file type from the list. In the example below, the search term GIS brings up a list of file types with GIS in the name.
Click Include or Exclude
Â
Â
NOTE: Files can be either included or excluded from Extraction or Indexing, but you cannot choose some files to include and some to exclude. That means when you select files to include in Indexing or Extraction, the option to Exclude files is grayed-out.
Indexing Archive Files
You can index the contents of Archive files, but keep in mind that archives must first be unpacked locally. Make sure that sufficient file space is available if multiple Archives are being indexed.
Â
To Index the contents of Archive files:
Click the button next to Index the content of archives
Select the Archive file types to Index or click All
Set an upper size limit for Archives with Skip Archives larger than
To ensure that the unpacked files are removed after indexing, select Delete local files after Indexing
Detecting MIME type
Â
Select how to identify the MIME type of files:
File Name - Uses only the file name to assess the MIME type of the file.
File Content - Uses the actual file contents to determine the MIME type.Â
NOTE: The File Content setting can potentially take much longer than File Name, depending on the size of the data set Voyager is indexing.Â
Adding Custom FiltersÂ
In this section, enter JSON to create rules that exclude files and folders from indexing based on their attributes. This is a different method of excluding files that uses the attributes of the files themselves instead of the MIME type.
Example 1
Excludes files that contain the exact string SOMETHING in the file name.
{
"type": "Folder",
"file": {
"type": "Name",
"match": {
"type": "Any",
"vals": [
{
"type": "Contains",
"pattern": "SOMETHING"
},
{
"type": "Prefix",
"pattern": "START_"
},
{
"type": "Suffix",
"pattern": "_END"
},
{
"type": "Glob",
"pattern": "*_GLOB_*"
},
{
"type": "Equals",
"val": "EXACT_STRING"
}
]
}
}
}
Example 2
Excludes files of a specified range of sizes.
{
"type": "Folder",
"file": {
"type": "FileSize",
"min": 1024,
"max": 9223372036854775807
},
"folder": {
"type": "MaxDepth",
"depth": 6
}
}
Â