Metadata Extraction
Voyager supports metadata extraction from standard XML documents using XPath queries. It not only supports many standard metadata specifications out of the box, but also allows you to enter your own XPath queries to specific metadata elements and map them to searchable field names within Voyager's index. These field names can exist already, or be created on the fly. This topic provides an overview of the Voyager Metadata Extraction page, explains how to define XPath queries to metadata elements, and how to specify field mapping parameters.
Access to the Metadata Extraction page is in Voyager Server’s Manage UI: Manage Voyager > Discovery > Metadata.
Testing Your Mapping
To map the fields, configure these parameters:
Choose the Selector: This specifies XPath query to a specific metadata record element to be selected.
Enter the Field Name: This is the target field in Voyager that gets mapped to the specified metadata output.
Confirm the Type: This refers to the data type of the field name. For example, if field name is set to “name”, data type automatically gets set to “text.”
Choose an Action: Users can select from five different functions:
Set Field— Assigns value to the specified field
Append Field—Adds to/modifies pre-existing field
Set Geo—Sets a geographic bounding box based on coordinates specified in the metadata
Expand Geo—Expands geographic bounding box from previously set coordinates
Add Link—Helps point field to a URL
Converter - Converter settings are optional and if the user does not specify one, Voyager, by default, assigns an appropriate converter to the field.
Bbox -- Converts bounding box values contained in the XML document
Gml_Geometry -- Converts geometric (line, circle etc) coordinates from the XML document
Date -- If the Date field is represented as a string value in the XML document, this converts it into a standard date format
String256 -- Finds a String within the element with a maximum length of 256 characters
String512 -- Finds a String within the element with a maximum length of 512 characters
StringValue -- Finds a String (of any length) within the element
6. Properties
Required: Checking this box validates the field being extracted from the XML document.
Skip if Exists: If a field has been previously added, checking this box ensures that a duplicate field does not get added to your list.
Warn On Replace: If a destination field already exists, checking this field flags the newly set field value.
Using the XML Box
The XML box allows you to enter in an XML document to test your XPath queries to paired elements.
Step 1: Click the XML tab and paste the contents of a valid XML document here. Click Save to save the XML contents.
In this case, the element we want extracted from the XML tab is City.
Step 2: Specify values for Selector, Field Name and Action.
Since we want to extract the field City, we copy the XPath Query from the XML document into the Selector box. "/metadata/metainfo/metc/cntinfo/cntaddr/city"
Step 3: Specify the corresponding Field Name that the queried element is mapped to. Voyager automatically detects the (Data) Type for the Field Name.
For example, here the Field Name is City, whose data type is String.
Note: when selecting a field name you'll need to either select an existing field name or you can also enter a custom field name as long as it uses a prefix "meta_", "id_".
Step 4: Click Test. The extractor searches the XML document for the queried metadata element, and retrieves the value for the field City. The results are presented in the Output tab.
In this specific example, "Washington D.C.", which is the value for the City query. is retrieved from the XML tab and displayed in the Output tab. When included in the index in this way, users can use this output result to search for XML documents through Voyager's search UI.
Step 5: Click Save to add the XPath query to the list.
Click the Edit link to make changes to an existing Selector.
Use the up or down Arrows help change the order of a Selector.
Select the [X] to delete an existing Selector.