Advanced Filtering in Voyager Server

Advanced Filtering in Voyager Server

Voyager’s configuration settings in Server’s Manage UI allows for users to fine tune a range of indexing options. When adding a Repository, admins can limit indexing based on format types, file names, and file contents. In addition to this, filtering based on other criteria are also configurable through Voyager Server’s Advanced Filtering option. Users can specify a variety of Exclusion Rules to further tailor their index settings. 

To access Advanced Filters:

  1. Hover over Discovery and select Discovery Configuration. Under Default Settings, select the Formats tab and then select the Advanced tab.  

  2. Under the Advanced tab, select Custom under the Filter section. 

 

Use the Custom text box to setup specific filtering conditions for indexing. There is also an option to select from several examples (highlighted in red in the image above) that are presets of advanced filters.

Important Notes and Rules

  • File filters must be placed in the file filter section

  • Folder filters must be placed in the folder filter section

  • All filters must have a valid type property

    • Matcher Types

      Matchers are used within name and pattern filters:

      • prefix: Matches names starting with a string

      • suffix: Matches names ending with a string

      • contains: Matches names containing a string

      • equals: Exact name match

      • regex: Regular expression match

      • in-set: Matches if name is in a set of values

      • not-in-set: Matches if name is not in a set of values

      • exclude: Excludes matching names

  • When combining filters, use the filters property (plural)

Common Issues

  • Silent Rejection
    The system will silently reject filters that:

    • Are not under the filter property

    • Have incorrect property names

    • Have invalid filter types

    • Have malformed JSON structure

    • Have missing required properties

    • Have invalid date formats

    • Have invalid regex patterns

    • Have invalid glob patterns

  • Rejected Rules

    • As of HQ 1.13, any filter or rule that is rejected will disappear from the JSON editor after clicking SAVE.


These advanced filters allow you to control exactly which files and folders are excluded from your repository when indexing. They can be simple or combined with other filters. Use the Filter > Custom option to setup specific conditions for indexing. Below are the main types of commonly used advanced filters:

 

Example 1: Filter Files based on File Size

This option allows for indexing files in a folder repository that fits within a user-defined range of files (Example: between 5 MB and 20 MB). This option allows admins to index files that meet certain size specifications which can help conserve indexing resources and prevent crashes to indexing agents.

{"file": {   "_class": "voyager.api.discovery.path.InverseFilter",   "wrap": {     "_class": "voyager.api.discovery.path.FileSizeFilter",     "max": 20000000,     "min": 5000000   } }}

NOTE: The minimum and maximum values are in bytes. 

 

Example 2: Filter Files based on Large File Sizes

The file-size filter works with exact byte values. It's important to note:

  • min and max are inclusive

  • Values are in bytes

  • Omitting min means no lower limit

  • Omitting max means no upper limit

{ "type": "file-size", "min": 1000000, "max": 10000000 }

 

Example 3: Filter Files based on File Name

This option allows for filtering based on the name of the file being indexed.

{ "type": "name", "matcher": { "type": "prefix", "value": "report" } }

 

Example 4: Filter Files based on Folder Name

This option allows for filtering based on the name of the folder being indexed.

{ "type": "name", "matcher": { "type": "equals", "value": "documents" } }

 

Example 5: Filter Files based on Folders and Sub-Folders 

This option allows for excluding subfolders when indexing a folder repository.

{   "type": "Folder",   "file": {     "type": "Name",     "match": {       "type": "Any",       "vals": [         {           "type": "InSet",           "val": [             "NAME OF FOLDER TO EXCLUDE 1",             "NAME OF FOLDER TO EXCLUDE 2"           ]         },        ]     }   } }

NOTE: This exclusion setting applies to subfolders up to three levels from the root folder.

 

Example 6: Filter only a Select Few of Sub Folders within a Parent Folder

This option allows users to index a repository based on the desired sub folders within the parent file. Rather than excluding the sub folders that are not desired, a user can include the specific folders that they want to have indexed. You have the option to filter using the folder name but you have to use the "inverse" filter for it to work properly.

{ "file": { "filter": { "value": "regex:.Archive.", "type": "pattern-list" }, "type": "inverse" } }

 

Example 7: Filter Files based on File Pattern

This option allows for filtering files using glob or regex patterns.

{ "type": "pattern-list", "value": "**/*.pdf" }

 

Example 8: Filter Files by Date

This option allows for filtering files based on their modification date.

{ "type": "file-date", "from": "2024-01-01T00:00:00Z", "to": "2024-03-31T23:59:59Z" }

 

Example 9: Filter Files by Max Depth

This options allows for filtering files by the maximum folder depth. This being the number of folders from the parent folder that will filter out any folder beyond that depth.

{ "type": "max-depth", "depth": 2 }

 

Example 10: Filter Files by at Depth

This options allows for filtering files by the specific folder depth. This being the number of folders from the parent folder that will filter out any folder before and after that depth.

{ "type": "at-depth", "depth": 1, "filter": { "type": "name", "matcher": { "type": "equals", "value": "documents" } } }

 

Example 11: Filter Files based on Specific File Types

This option uses the pattern-list type with glob patterns to match files by their extensions. The ** pattern is crucial here:

  • **/*.pdf means "match any file ending in .pdf in any folder at any depth"

  • The ** matches zero or more directories

  • The * matches any filename

  • Multiple patterns are separated by newlines

{ "type": "pattern-list", "value": "**/*.pdf\n**/*.doc\n**/*.docx" }

 

Example 12: Filter Files based on File Extensions

This option allows users to index only files that are of a defined file extension. These files can be indexed if they fall within the listed extensions that Voyager is capable of indexing (Example: .aaa). 

{"file": {   "_class": "voyager.api.discovery.path.PatternListFilter",   "value": "**/*.aaa" }} Exclude filenames that contain the string "temp" This option allows users to exclude indexng of temp files.   {"file": {   "_class": "voyager.api.discovery.path.NameFilter",   "match": {     "_class": "voyager.api.util.ContainsMatcher",     "pattern": "temp"     }   } }

 

Example 13: Filter Files based on MIME Type

This option allows for filtering files based on their by MIME type whether it be audio, video, image, text, application or other files.

{ "type": "mime-type", "matcher": { "type": "exclude", "value": ["video/*", "audio/*"] } }

 

Example 14: Filter Files Based on MIME Type Exclusion

This option allows for filtering files that exclude based on MIME types whether it be audio, video, image, text, application or other files.

  • * in MIME types matches any subtype

  • exclude means "exclude if matches"

  • Multiple MIME types can be specified

{ "type": "mime-type", "matcher": { "type": "exclude", "value": ["video/*", "audio/*"] } }

 

Example 15: Filter using Regular Expressions

This option allows users to filter using regex (aka regular expressions), which are for setting parameters used to match character combinations in strings for searching, manipulating, and validating text in a file. You have the option to filter using the folder name but you have to use the "inverse" filter for it to work properly. The expression below shows only 2 folders being filtered for indexing.

{ "file": { "filter": { "value": "regex:.ntf3.|.ntf4.", "type": "pattern-list" }, "type": "inverse" } }

NOTE: If the child filter would match a file, the inverse filter excludes it. If the child filter would not match a file, the inverse filter includes it.

 

Example 16: Filter using Combined Filters

You can combine multiple filters using all (AND) or any (OR) logic like in the example below for filtering based on file name and file size:

{ "type": "all", "filters": [ { "type": "pattern-list", "value": "**/*.pdf" }, { "type": "file-size", "min": 1000 } ] }