Document plug-in part 1: automated metadata modification

In 2022, we released the document plug-in for FileHold. Like our workflow plug-in, this new tool allows for customization and automation to be added to standard product features in a simple way by non-programmers. The document plug-in has two functions: automated metadata modification and document validation.

In part 1 of this article, I will present three use cases for automated metadata modification including the necessary plug-in configuration. Like the workflow plug-in, this tool uses our hybrid SQL concept to configure the plug-in. This technology combines our merge tag feature with standard SQL to provide a powerful mechanism for customization. A part 2 on document validation is also available.

Use case: Calculate contract dates

Contracts typically have a start date and a term. If they are evergreen, they probably have a minimum period prior to the end of the term when the contract can be cancelled or changed. Companies with a contract review process probably have a period before the end of term when they would like to start reviewing the contract to determine if changes or cancellation are needed.

These requirements imply that four dates are necessary in the documents schema to be able to handle the contract properly in searches and event processing:

  • Contract effective date
  • Contract expiration date
  • Contract term notice date
  • Contract term review date

In the most straightforward circumstances, the user would simply enter these dates as needed for the contract. However, this exercise is likely prone to error and undue effort when there is likely a more effective method to enter the information if the dates can be automatically calculated. This is where the document plug-in comes in. It operates after the user has completed the metadata panel and pressed add or save.

Let’s consider three new metadata fields:

  • Contract term length
  • Contract term notice length
  • Contract renewal lead time

These fields can be dropdown menus with a fixed list of values, or they can simply be numbers. We will assume that the term length is in months and the other two values are in days. The plug-in requires us to return a list of field ids and values so we need to both calculate the correct dates and find the field IDs for the contract expiration date, term notice date and term review date fields as we will use the effective date as the starting point for all calculations.

The hybrid SQL will be placed in the document schemas description, so we will probably want the bulk of the computation to happen in a stored procedure to keep things compact and easy to debug. The hybrid SQL might look like this:

### update-metadata #
EXEC zfh_CalculateReviewDates
     @TermLengthMonths = [[md|Contract Term Length Months]]
   , @RenewalLeadTime = [[md|Contract Renewal Lead Time]]
   , @TermNoticeLength = [[md|Contract Term Notice Length]]
   , @EffectiveDate = [[md|Contract Effective Date|yyyy-MM-dd]]

We are passing in four metadata fields in merge tags and we will expect three fields in return. The following stored procedure will access the FileHold database table DocumentSchemaFieldDefinitionId to get the internal IDs for the metadata fields that will be returned and use three SQL SELECT statements joined with UNION ALL to produce the three records needed using SQL’s DATEADD function to help with the date calculations. The name zfh_CalculateReviewDates is arbitrary, but designed to make it easy to identify as a custom stored procedure along with its purpose.

USE [ch_librarymanager]
CREATE PROCEDURE [dbo].[zfh_CalculateReviewDates]
   @TermLengthMonths nvarchar(max), -- Length of the contract in months
   @RenewalLeadTime nvarchar(max), -- Number of days before term notice required to review contract
   @TermNoticeLength nvarchar(max), -- Days before end of term to change contract
   @EffectiveDate nvarchar(max) -- Start of contract
   declare @ContractExpirationDateId as int = ( select DocumentSchemaFieldDefinitionId from DocumentSchemaFieldDefinitions where Name = 'Contract Expiration Date') declare @ContractTermNoticeDateId as int = ( select DocumentSchemaFieldDefinitionId from DocumentSchemaFieldDefinitions where Name = 'Contract Term Notice Date')
   declare @ContractTermReviewDateId as int = ( select DocumentSchemaFieldDefinitionId from DocumentSchemaFieldDefinitions where Name = 'Contract Term Review Date')
   declare @EndDateOffset as int = cast( @TermLengthMonths as int )
   declare @TermDateOffset as int = cast( @TermNoticeLength as int )
   declare @ReviewDateOffset as int = cast( @RenewalLeadTime as int )
   select @ContractExpirationDateId FieldId, convert( nvarchar, dateadd( month, @EndDateOffset, cast( @EffectiveDate as datetime )), 120 ) FieldValue -- 120 = ODBC canonical
   union all
   select @ContractTermNoticeDateId FieldId, convert( nvarchar, dateadd( day, -@TermDateOffset, dateadd( month, @EndDateOffset, cast( @EffectiveDate as datetime ))), 120) FieldValue
   union all
   select @ContractTermReviewDateId FieldId, convert( nvarchar, dateadd( day, -1 * (@ReviewDateOffset + @TermDateOffset), dateadd( month, @EndDateOffset, cast( @EffectiveDate as datetime ))), 120 ) FieldValue

Each time a document with this configuration in its schema is added or updated, this code will execute, and the dates will be updated. For user convenience, the date fields can be set as read only as the plug-in has the power to write to read-only fields on behalf of the user as an administrator without given the user any administration permissions. Using a stored procedure makes the main logic easy to test from the SQL management interface or similar tool. It is important to note that merge tags are always passed as NVARCHAR type, so a test might appear as follows:

EXEC zfh_CalculateReviewDates N’36’, N’60’, N’90’, N’2023-06-17’

Use case: Extract information from a document name

It is common for filenames to come from other systems with structured naming. For example, the name may include an account number or a person’s name. While this information can be left in the name and searched in FileHold, it is often more effective for the information to be in a metadata field. This way it can more easily be used in auto-filing or schema lookups.

Let’s assume a financial system is producing statements that are being stored in FileHold for use by a customer service team. When someone contacts the customer service team, they provide there name and expect the service person to find their statements. Let’s also assume that the PDF statements produced by the financial system are named as “STMT-<account-number>.PDF” where <account-number> is the 8 digit account number of the person associated with the statement.

We can use a schema lookup to find the person’s name from the account number, but first the account number must be in the account number field. Out of the box, the user adding the statement would need to look at the name and type or copy paste the account number to the correct metadata field. If there are a lot of these statements, automation would be more effective.

This situation has a more straightforward solution then the previous case as extracting the account number can be performed with a simple SQL function. The hybrid SQL to add to the schema description would look like this:

### update-metadata #
SELECT (select DocumentSchemaFieldDefinitionId from DocumentSchemaFieldDefinitions where Name = 'Account number'), SUBSTRING( [[sys|document-name]], 6, 8 )

The merge tag [[sys|document-name]] contains the filename without extension and we will use the SUBSTRING function to retrieve the 8 character account number starting from the 6th character in the filename.

Use case: Change document schema according to classification

Document lifecycle events are processed based on the events defined in the document schema. There are cases where the lifecycle requirements for a specific document are the only difference from another document. For example, all administration documents might share most of the same requirements for the document schema, but policies and procedures documents have a different lifecycle than the minutes of staff meetings. This might lead to two documents schemas: ADMIN-PP and ADMIN-SM.

Users could be asked to correctly select the schema necessary for the document, but the names are slightly obscure, and the selection could be prone to error. A less error prone approach would be to select the classification from a dropdown menu with options like “Policies and procedures” and “Staff meetings”. The document plug-in can step in here to make sure the schema is correctly selected according to the dropdown value chosen.

For simplicity of the example we will assume that we know that the dropdown menu is named “Classification” for “Policies and procedures” and “Staff meetings” and the IDs for ADMIN-PP and ADMIN-SM are 10 and 20 respectively. Live values can be found in the DocumentSchemaFieldChoices and DocumentSchemas tables of the library manager database. We will add the same hybrid SQL to both as the action will be ignored if the schema is unchanged. It is also important to know that document schema has a special field ID, as a system field, of -6.

### update-metadata #
   CASE [[md|Classification]]
      WHEN 'Policies and procedures' THEN 10
      WHEN 'Staff meetings' THEN 20

If we assume all the metadata fields in both ADMIN-PP and ADMIN-SM are they same, they will be unchanged in the new schema. In a real production implementation, it would be a good idea to have some error handling that makes sense (hint: see part 2). For example, an administrator added a third choice to the classification dropdown but did not update this configuration.

The document plug-in is available by contacting your account manager for a quotation if you have FileHold version 16.3 or higher. The FileHold professional services team can help you configure the plug-in if you do not want to tackle the hybrid SQL on your own.

Russ Beinder

Russ Beinder is the Chief Technology Officer at FileHold. He is an entrepreneur, a seasoned business analyst, computer technologist and a certified Project Management Professional (PMP). For over 35 years he has used computer technology to help organizations solve business problems.