Enable the POI Extractor for office Documents

For customers who do not use Libre office, the java/poi extractor can be used to extract content from the most common office document formats. The supported formats are: doc, docx, ppt, pptx, xls, xlsx.

 

Enable the java/poi extractor for office documents:

  1. In HQ, select the Formats menu on the left

  2. Search for doc

  3. Select Microsoft Word Document

    Formats.png
  4. Select the Extractors tab

    Extractors.png
  5. For java/poi, select the move up option until it is at the top (if java/poi is missing, see below)

  6. Repeat this process for the following formats: docx, ppt, pptx, xls, xlsx

  7. Select the System menu on the left

  8. Select, Restart/Shutdown

  9. Restart HQ

 

 

NOTE: If the java/poi option is missing from the menu, use the following procedure:

  1. On the file system, navigate to HQ_HOME/config

  2. With HQ stopped, delete or rename mimes.json

  3. Start HQ and enable java/poi

 

Additional information / troubleshooting:

If any changes have been made to the format settings previously, deleting mimes.json will cause those changes to be lost. If losing those changes is not an option, the java/poi extractor can be added to the mimes.json file manually, if required. Please note, this is an advanced procedure and can cause HQ to not load, if done improperly. It is recommended to copy the original mimes.json file to a new location before deleting it. Using the old mimes.json file as a reference, add the settings to HQ using the UI as described above.

In some cases a rebuild of the mimes index may be required. This can be done using the mimes api rest endpoint).