Advanced Filtering in Voyager Server
Voyager’s configuration settings in Server’s Manage UI allows for users to fine tune a range of indexing options. When adding a Repository, admins can limit indexing based on format types, file names, and file contents. In addition to this, filtering based on other criteria are also configurable through Voyager Server’s Advanced Filtering option. Users can specify a variety of Exclusion Rules to further tailor their index settings.
To access Advanced Filters:
Hover over Discovery and select Discovery Configuration. Under Default Settings, select the Formats tab and then select the Advanced tab.
Under the Advanced tab, select Custom under the Filter section.
Use the Custom text box to setup specific filtering conditions for indexing. There is also an option to select from several examples (highlighted in red in the image above) that are presets of advanced filters.
Important Notes and Rules
File filters must be placed in the file filter section
Folder filters must be placed in the folder filter section
All filters must have a valid
typepropertyMatcher Types
Matchers are used within name and pattern filters:
prefix: Matches names starting with a stringsuffix: Matches names ending with a stringcontains: Matches names containing a stringequals: Exact name matchregex: Regular expression matchin-set: Matches if name is in a set of valuesnot-in-set: Matches if name is not in a set of valuesexclude: Excludes matching names
When combining filters, use the
filtersproperty (plural)
Common Issues
Silent Rejection
The system will silently reject filters that:Are not under the
filterpropertyHave incorrect property names
Have invalid filter types
Have malformed JSON structure
Have missing required properties
Have invalid date formats
Have invalid regex patterns
Have invalid glob patterns
Rejected Rules
As of HQ 1.13, any filter or rule that is rejected will disappear from the JSON editor after clicking SAVE.
These advanced filters allow you to control exactly which files and folders are excluded from your repository when indexing. They can be simple or combined with other filters. Use the Filter > Custom option to setup specific conditions for indexing. Below are the main types of commonly used advanced filters:
Example 1: Filter Files based on File Size
This option allows for indexing files in a folder repository that fits within a user-defined range of files (Example: between 5 MB and 20 MB). This option allows admins to index files that meet certain size specifications which can help conserve indexing resources and prevent crashes to indexing agents.
{"file": {
"_class": "voyager.api.discovery.path.InverseFilter",
"wrap": {
"_class": "voyager.api.discovery.path.FileSizeFilter",
"max": 20000000,
"min": 5000000
}
}}NOTE: The minimum and maximum values are in bytes.
Example 2: Filter Files based on Large File Sizes
The file-size filter works with exact byte values. It's important to note:
minandmaxare inclusiveValues are in bytes
Omitting
minmeans no lower limitOmitting
maxmeans no upper limit
{
"type": "file-size",
"min": 1000000,
"max": 10000000
}
Example 3: Filter Files based on File Name
This option allows for filtering based on the name of the file being indexed.
{
"type": "name",
"matcher": {
"type": "prefix",
"value": "report"
}
}
Example 4: Filter Files based on Folder Name
This option allows for filtering based on the name of the folder being indexed.
{
"type": "name",
"matcher": {
"type": "equals",
"value": "documents"
}
}
Example 5: Filter Files based on Folders and Sub-Folders
This option allows for excluding subfolders when indexing a folder repository.
{
"type": "Folder",
"file": {
"type": "Name",
"match": {
"type": "Any",
"vals": [
{
"type": "InSet",
"val": [
"NAME OF FOLDER TO EXCLUDE 1",
"NAME OF FOLDER TO EXCLUDE 2"
]
},
]
}
}
}NOTE: This exclusion setting applies to subfolders up to three levels from the root folder.
Example 6: Filter only a Select Few of Sub Folders within a Parent Folder
This option allows users to index a repository based on the desired sub folders within the parent file. Rather than excluding the sub folders that are not desired, a user can include the specific folders that they want to have indexed. You have the option to filter using the folder name but you have to use the "inverse" filter for it to work properly.
{
"file": {
"filter": {
"value": "regex:.Archive.",
"type": "pattern-list"
},
"type": "inverse"
}
}
Example 7: Filter Files based on File Pattern
This option allows for filtering files using glob or regex patterns.
{
"type": "pattern-list",
"value": "**/*.pdf"
}
Example 8: Filter Files by Date
This option allows for filtering files based on their modification date.
{
"type": "file-date",
"from": "2024-01-01T00:00:00Z",
"to": "2024-03-31T23:59:59Z"
}
Example 9: Filter Files by Max Depth
This options allows for filtering files by the maximum folder depth. This being the number of folders from the parent folder that will filter out any folder beyond that depth.
{
"type": "max-depth",
"depth": 2
}
Example 10: Filter Files by at Depth
This options allows for filtering files by the specific folder depth. This being the number of folders from the parent folder that will filter out any folder before and after that depth.
{
"type": "at-depth",
"depth": 1,
"filter": {
"type": "name",
"matcher": {
"type": "equals",
"value": "documents"
}
}
}
Example 11: Filter Files based on Specific File Types
This option uses the pattern-list type with glob patterns to match files by their extensions. The ** pattern is crucial here:
**/*.pdfmeans "match any file ending in .pdf in any folder at any depth"The
**matches zero or more directoriesThe
*matches any filenameMultiple patterns are separated by newlines
{
"type": "pattern-list",
"value": "**/*.pdf\n**/*.doc\n**/*.docx"
}
Example 12: Filter Files based on File Extensions
This option allows users to index only files that are of a defined file extension. These files can be indexed if they fall within the listed extensions that Voyager is capable of indexing (Example: .aaa).
{"file": {
"_class": "voyager.api.discovery.path.PatternListFilter",
"value": "**/*.aaa"
}}
Exclude filenames that contain the string "temp"
This option allows users to exclude indexng of temp files.
{"file": {
"_class": "voyager.api.discovery.path.NameFilter",
"match": {
"_class": "voyager.api.util.ContainsMatcher",
"pattern": "temp"
}
}
}
Example 13: Filter Files based on MIME Type
This option allows for filtering files based on their by MIME type whether it be audio, video, image, text, application or other files.
{
"type": "mime-type",
"matcher": {
"type": "exclude",
"value": ["video/*", "audio/*"]
}
}
Example 14: Filter Files Based on MIME Type Exclusion
This option allows for filtering files that exclude based on MIME types whether it be audio, video, image, text, application or other files.
*in MIME types matches any subtypeexcludemeans "exclude if matches"Multiple MIME types can be specified
{
"type": "mime-type",
"matcher": {
"type": "exclude",
"value": ["video/*", "audio/*"]
}
}
Example 15: Filter using Regular Expressions
This option allows users to filter using regex (aka regular expressions), which are for setting parameters used to match character combinations in strings for searching, manipulating, and validating text in a file. You have the option to filter using the folder name but you have to use the "inverse" filter for it to work properly. The expression below shows only 2 folders being filtered for indexing.
{
"file": {
"filter": {
"value": "regex:.ntf3.|.ntf4.",
"type": "pattern-list"
},
"type": "inverse"
}
}NOTE: If the child filter would match a file, the inverse filter excludes it. If the child filter would not match a file, the inverse filter includes it.
Example 16: Filter using Combined Filters
You can combine multiple filters using all (AND) or any (OR) logic like in the example below for filtering based on file name and file size:
{
"type": "all",
"filters": [
{
"type": "pattern-list",
"value": "**/*.pdf"
},
{
"type": "file-size",
"min": 1000
}
]
}