For customers who do not use Libre office, the java/poi extractor can be used to extract content from the most common office document formats. The supported formats are: doc, docx, ppt, pptx, xls, xlsx.
Enable the java/poi extractor for office documents:
In HQ, select the Formats menu on the left
Search for doc
Select Microsoft Word Document
Select the Extractors tab
For java/poi, select the move up option until it is at the top (if java/poi is missing, see below)
Repeat this process for the following formats: docx, ppt, pptx, xls, xlsx
Select the System menu on the left
Select, Restart/Shutdown
Restart HQ
NOTE: If the java/poi option is missing from the menu, use the following procedure:
On the file system, navigate to HQ_HOME/config
With HQ stopped, delete or rename mimes.json
Start HQ and enable java/poi
Additional information / troubleshooting:
If any changes have been made to the format settings previously, deleting mimes.json will cause those changes to be lost. If losing those changes is not an option, the java/poi extractor can be added to the mimes.json file manually, if required. Please note, this is an advanced procedure and can cause HQ to not load.
In some cases a rebuild of the mimes index may need to be rebuilt. This can be done using the mimes api rest endpoint).