Automatic extraction of metadata values from file properties

The file properties of a file can be automatically extracted into metadata fields for a defined schema when an extraction rule for that file type is configured. Since all file types have file properties, you can extract metadata from any type of file. This is useful for file types such as images where you can extract information such as the size of the picture, the camera type, exposure time, resolution, and so on directly from the file.

The file properties that can be extracted are taken from the Details tab of the file properties which can be viewed from Microsoft Windows Explorer. These properties may vary for each file type and in operating systems such as Windows 7 or Windows 10. The example below shows some of the file properties of an image file in Windows Explorer in Windows 7.

Image
File properties

When creating extraction rules for files, you can create an extraction rule for each type of file that you want to extract data from. For example, you can set a separate rule for a docx, xlsx, pdf, jpg, tiff, and so on. You can create several extraction rules per file extension; however, only one extraction rule per file extension can be enabled at a time.

A document template is simply any file with the file type that you want to extract metadata from. The document template used will determine the type of file property extraction rule created; it is dependent on the file type such as a docx, xlsx, pdf, jpg and so forth. For example, to create a jpg file extraction rule, select a jpg file as the template.

A document schema is also assigned to the rule and the metadata fields are mapped to the file properties. When a document of that type is added to FileHold using that schema then the file properties is automatically extracted.

When setting up file properties extraction rules, the UTC date or local file date can be used. If the file type uses UTC, then select the UTC check box in the configuration settings.

Only users with Library Administrator or higher permission can create extraction rules.

Extraction rules can be used in conjunction with the Import Jobs (Automatic Document Importation). The extraction rules are automatically applied when an import job is processing documents on the server.

Watch a video on File Properties Extraction.

TIP: Microsoft Office saves documents in a temporary file when they are being edited. This means that file properties related to the file name, location, or file type cannot be extracted when adding a document to FileHold using the FileHold toolbar. If you need to extract any of these values you can save the document, close it in the Microsoft Office application, and add the document from the FileHold Desktop Application (FDA).

To create a File Properties extraction rule

  1. Do one of the following:

  • In the FDA, log in as a library administration and go to Tools > Extraction Rules.
  • In the Web Client, go to Administration Panel > Library configuration > Extraction Rules.
  1. Select the "template" file from your computer and click OK. The "template" file selected determines the type of file properties extraction rule that is created. For example, to create a rule for jpg files, select a jpg file template.

  2. In the File Properties Rule window, enter a name for the rule.

  3. The Extensions field is automatically filled out with the type of template file selected. For example, if the template file is a jpg file, then the extension is jpg.

  4. Enter a description for the rule (optional).

  5. To enable the rule, ensure the Rule is Enabled check box is selected.

  6. Select the Assume UTC dates check box if the file that is being extracted is using UTC date and time format. If using local date and time, do not select the check box.

  7. In the Document Schema list, select the schema that is to be used for this rule. You may need to create a new schema for the document type.

  8. Map the metadata fields to the File Properties. Click ... to select the File Property in the Select File Property window. In the example below, an extraction rule was created for a image file (jpg) file type using the Photographs schema. The metadata fields in the Photograph schema are being mapped to the File Properties of the jpg "template" file.

Image
Configuration of file properties extraction rule
  1. When you have finished mapping the metadata fields to the File Properties fields, click OK.

  2. The File Extraction rule appears in the List of Extraction Rules.

To test the file properties extraction rule

  1. Log off and log back into FileHold.

  2. Add a document of that file type to FileHold. For example, if you created a rule for a jpg file, add a jpg file to the system.

  3. Check to make sure the file properties were extracted into the metadata fields. In the example below, a jpg file was added to the system using the Photographs schema and the mapped metadata was automatically extracted.

    Image
    Metadata extracted from file properties in metadata pane

The values in a file properties extraction rule are taken directly from Windows without any interpretation by FileHold except to map them to a metadata field. This may mean that some of the values do not appear as they do in File Explorer as that application may interpret the raw values recorded in the file.