Metadata extraction from PDF forms

PDF forms contain fillable fields which users can fill out using the free Adobe Acrobat Reader software. The values entered into the fields can be automatically extracted into the metadata fields of a schema thereby reducing the amount of time it takes to index or “tag” a document.

In order to create PDF forms, you need software such as Adobe Acrobat Pro. You cannot create PDF forms using the FileHold document management software.

The PDF form extraction rule is created in the FileHold Desktop Application (FDA). The rule is based on the PDF form template used. Multiple PDF extraction rules can exist. This means that you can have as many PDF form extraction rules as needed. Both “classic” and Adobe XML Forms Architecture (XFA) are supported.

When mapping the fields on the PDF forms to the metadata fields in the schema, ensure that the values entered in a PDF form can be accepted into the metadata fields. For example, if the PDF form has a drop-down list and the metadata field it is mapped to is also a drop-down list, then the values of both must match exactly. Another example is if the value of a field in the PDF form is a text field and the metadata field it is mapped to is a numeric field, then the value of the PDF form may not populate the metadata field if there are alphabetical characters in the PDF form. To overcome these types of issues, simply make the metadata fields a text type so it can accept anything from the PDF form.

Extraction rules can be used in conjunction with the Import Jobs (Automatic Document Importation). The extraction rules are automatically applied when an import job is processing documents on the server.

Watch the PDF Forms Extraction Rule training video.

To create a PDF form extraction rule

  1. Do one of the following:
  • In the FDA, log in as a library administration and go to Tools > Extraction Rules.
  • In the Web Client, go to Administration Panel > Library configuration > Extraction Rules.
  1. In the Select Template File window, select the PDF form "template" file from your computer and click OK.
  2. In the PDF Forms Rule window, enter a name for the rule.
  3. The Extensions field is automatically filled out with the type of PDF.
  4. Enter a description for the rule (optional).
  5. To enable the rule, ensure the Rule is Enabled check box is selected.
  6. In the Document Schema list, select the schema that is to be used for this rule. You may need to create a new schema for the document type.
  7. Map the metadata fields to the fields on the PDF form. Click ... to select the PDF form field.
Image
PDF form extraction rule
  1. Click OK.
  2. **Log off the FDA** and log back in for the rule to take effect.
  3. Test the PDF form extraction rule using the PDF form that was used as a template. Fill out the form, save it, and add it to the document management system.
  4. The values entered in the form and mapped in the extraction rule will appear in the metadata pane.