Extraction rules

An Extraction Rules tool has been created in order to manage the extraction of metadata from Microsoft Office Outlook msg files, file properties of any file type, PDF forms, and Microsoft Office Word forms. This allows the information contained within the emails, file properties or forms to automatically populate the metadata fields within a document schema.

When mapping the source fields to the metadata fields in the schema, ensure that the values entered in the source fields can be accepted into the metadata fields. For example, if a PDF form has a drop-down list and the metadata field it is mapped to is also a drop-down list, then the values of both must match exactly. Another example is if the value of a field in the source is a text field and the metadata field it is mapped to is a numeric field, then the value may not populate the metadata field if there are alphabetical characters in the field. To overcome these types of issues, simply make the metadata fields a text type so it can accept anything from the source fields.

Extraction rules can be used in conjunction with the Import Jobs (Automatic Document Importation). The extraction rules are automatically applied when an import job is processing documents on the server. Any metadata values extracted take precedence over the metadata values defined in the import job.

Semi-colons and commas cannot be used with drop-down fields when used with extraction rules.

Extraction rules will work only when adding documents through the FileHold Desktop Application (FDA) or through Automatic Document Importation.

Extraction rules are only accessible by Library Administrators or higher permissions.

To access the extraction rules

  1. Do one of the following:
  • In the FDA, log in as a library administration and go to Tools > Extraction Rules.
  • In the Web Client, go to Administration Panel > Library configuration > Extraction Rules.

There are three types of extraction rules that can be created:

  1. Email Headers - Values contained in the headers of Microsoft Outlook msg files.
  2. File Properties - File properties of any file type.
  3. XML Nodes - Values entered into a Microsoft Word content controls
  4. PDF Forms - Values entered in an Adobe PDF form.

When the extraction rules are properly configured, the values from emails, file properties or xml nodes can be automatically extracted into the metadata fields of a schema.

To see the extraction rules in action, watch our video tours.