HQ - File Formats

HQ can index thousands of different file formats, and each Format has specific Connectors and Extractors corresponding to the two phases of Indexing - Scanning and Extraction. An Extractor can often operate on multiple related file formats, and a particular type of file may have more than one associated Extractor. This article is focused on the list of file Formats and their associated Extractors in HQ.

In addition you can learn more about specific extractors and our support for specific formats within this help system.

About Formats, Extractors and Indexing

To put Formats and Extractors in context, it is helpful to describe where they fit in the two phases of Indexing - Scanning and Extraction..

  • Scanning is done by Connectors, which crawl the repository and gather easily accessible information from each document, such as file name, size, last modification date etc. Scanning does not open any files. All readable data types have Connectors, and the Connector framework can easily be extended to support new formats, for example the new Web Connector

  • Extraction is done by Extractors which open data files and extract additional information to the data crawled during the scanning phase. This might involve reading text from a Word document or reading metadata tags from an image file. HQ chooses the extractor based on the format (MIME type) of each file or document.   

Viewing the Format List

To view the available file formats:

  1. Open HQ

  2. Click Indexing

  3. Click Formats

 

 

The window displays file formats sorted by name.  For each file format, HQ displays the file extension(s) and the available Extractors.

Viewing Format and Extractor Information

Formats

  • Click a Format name to view more information about the format, for example WMS (Web Map Service)

  • Click the arrow next to Format Categories show format categories and the number of formats in each

  • Click a Category to see the Formats it includes

 

 

Extractors

  • Click an Extractor to view more information about it, for example tika

  • Click the arrow next to Extractors to see Extractor Categories and the number of extractors in each

  • Click a Category to see the Extractors it includes

 

 

 Searching Formats

  • To search the format list, enter a keyword in the search box. For example, entering arc will display a list of all file formats with arc in the name, including application/x-webarchive and ArcGIS Feature Service Layer.