Schema lookup part 2 – Interchangeable lookup keys
Often, data lookup requirements for metadata values are simple. There is a single key value the user provides that can be used to find all other values. This is the easiest schema lookup to configure and requires minimal or no knowledge of database programming. When used with a key that is a dropdown menu where that dropdown menu is configured with search fields, the single key value can turn into up-to five key ways of completing the lookup.
The single key value with search fields is great when the user will choose the value manually, but there are many cases were the lookup values come from document scans or other pre-processing. If the pre-processing has provided one of two values, both of which will equally satisfy the lookup key, there is a more efficient approach.
FileHold allows up to five metadata field values to be used in conjunction with each other or separately for a schema lookup when the Custom SQL Query option is chosen. The power of SQL can be leveraged to find the single row that is the best fit with whatever input values have been provided.
We recently had an example of an Australian customer who adds tax and other financial documents for their clients into FileHold from emails or scans. They have their own client reference number generated by their accounting system. They also have a tax file number (TFN). They may have an Australian business number (ABN) and or an Australian company number (ACN). Any one or more of these numbers may appear on one of their documents and all of them uniquely point to the same client.
The goal is to index the scan or email using Smart Soft Capture to automatically capture whatever numbers are available, feed them to the FileScan Bridge (manage imports) and have manage imports automatically lookup any missing information before adding them to the inbox for final verification and addition the library. The solution is to create a schema lookup that passes all four values, blank or not, to an SQL query that can return a single row of data with all the related values.
As a bonus, and for a bit of extra complexity, the customer wanted to include the document name and abbreviation to the document. However, they did not want to manually select two fields as one of the fields is just a different version of the other. Since we have only one lookup button, we will use our fifth value parameter (if you were counting, we already used four) and build some multi-lookup magic into the SQL query.
If it is a sunny day, and you are not an expert with SQL, now would be a good time to hop on your bike or leash up the dogs and take the rest of the afternoon off. You can always pass this article on to your in-house SQL guru or hit up FileHold professional services to help you with what is below.
Our customer was able to provide us with a view to the client data in their accounting system that included their internal client reference number, the client name, TFN, ABN and ACN where we could provide any one of the values, other than the name, to give us a unique record for the client. Further we were told that the importance of the identifiers was hierarchical starting from the TFN which every Australian client of theirs has, then the ABN, the ACN and if none of those would do, the internal client reference from the accounting system.
We started by creating a database dropdown menu metadata field for each client value. Dropdown menus are not strictly required for this application as only the client name is likely to change in the future but having dropdown values simplifies any cases were a document is added manually.
The following example illustrates the pattern used for all fields. The view into the customers accounting system is zvhs_client. The field ID or lookup by value is the client reference field. It will be the key that ties everything together. For each field we can add search by fields from the related data values as desired.
The bonus lookup will use yet another couple of dropdown menus of document titles and abbreviations. These are coming from a provided table zvabbreviations containing a record id, title and abbreviation field. There is no relationship between these values and the client values. They are only related when the metadata is attached to the document.
With the four client and two document title fields created we can define the schema lookup. Note that each of the four fields is a parameter to the custom query and a destination metadata field. As they are database dropdown menus, the source column is the same value we have used when they were created, the client reference.
It is important to know that when a dropdown menu value is passed to the custom query, the actual value that is passed is the lookup by (ID), not the retrieve from (caption). It produces a slightly interesting condition if a user was to fill out two or more of the customer fields, they could all be the same value (or different if the user made a mistake). However, it does not matter. Each of the client reference fields has been named as Ref1, Ref2, Ref3 and Ref4 in our solution to imply an order of priority that we will decode at the front of the query and ignore any lower priority values.
Now might be a good time to pass along a bit of debugging wisdom. It may be obvious to test your query in a tool like SQL Management before adding it to the schema configuration, but how do you test the values that go into the parameters and come out of the results? There is a little trick where you create a one liner SQL query that simply returns the parameters. That way you can assign the results to text metadata fields and inspect the details. It can be very helpful if you are unclear what FileHold is doing with the parameters and results. For example, SELECT @Ref1 as 'TFN test'.
The processing for the results should be clear. As for all database dropdown menus, the lookup assignments must use the lookup by (ID) value. The magic here is that we will assign all the client input values back to their respective metadata fields. So, if the user selects a client name John Smith, after the lookup, we will assign a value back to the client name even though one is already there. If the only input value was the client name, we will expect there to be no change. If the user also entered a TFN for a client other than John Smith, we expect the customer name to change accordingly as the TFN is a higher priority field than the client name.
With all this in place, the SQL query is fairly straight forward once we realize there are three scenarios to handle:
- At least one client field has a value,
- The document title has a value and
- Both a client field and the document title field have values.
This will allow the user to use the lookup button to find the document abbreviation without knowing the client or getting the client details without knowing the document title. If we want to make sure the document cannot be added until all the fields are present, we can just make the metadata fields required in the schema configuration. The user can press lookup as many times as they want.
The SQL query has four sections:
- Isolating the priority client field,
- Selecting a record with only client information,
- Selecting a record with only document title information and
- Selecting both client and document title information.
We use a CASE statement to look for the first non-blank client field in priority order. We use a UNION ALL to combine the three select scenarios to produce exactly zero or one records as a result. Note that this is not intended to be an instruction on the most efficient or best possible SQL, feel free to adapt the concept as needed in your environment.
DECLARE @ClientRef AS CHAR(8);
-- Isolating the priority client field
SET @ClientRef = CASE
WHEN ( @Ref1 <> ''
AND @Ref1 IS NOT NULL ) THEN @Ref1
WHEN ( @Ref2 <> ''
AND @Ref2 IS NOT NULL ) THEN @Ref2
WHEN ( @Ref3 <> ''
AND @Ref3 IS NOT NULL ) THEN @Ref3
WHEN ( @Ref4 <> ''
AND @Ref4 IS NOT NULL ) THEN @Ref4
ELSE NULL
END;
-- Selecting a record with only client information
SELECT *,
NULL AS Id,
NULL AS Abbreviation,
NULL AS Title
FROM dbo.zvhs_client
WHERE ref = @ClientRef
AND ( @TitleId = ''
OR @TitleId IS NULL
OR @TitleId NOT IN (SELECT id
FROM dbo.zvabbreviations) )
UNION ALL
-- Selecting a record with only document title information
SELECT NULL AS Ref,
NULL AS ClientName,
NULL AS LastName,
NULL AS FirstName,
NULL AS BizName,
NULL AS Tfn,
NULL AS Abn,
NULL AS Acn,
*
FROM dbo.zvabbreviations
WHERE id = @TitleId
AND ( @ClientRef = ''
OR @ClientRef IS NULL
OR @ClientRef NOT IN (SELECT ref
FROM dbo.zvhs_client) )
UNION ALL
-- Selecting both client and document title information
SELECT *
FROM dbo.zvhs_client,
dbo.zvabbreviations
WHERE id = @TitleId
AND @ClientRef = ref
That’s it. This SQL query pattern can be expanded to several other scenarios. The trick is to determine what cases you will allow to coexist and add as may unions as needed to describe them all. Just make sure there is no chance your query will return more than one record as that will cause the lookup button to give the user an error message and not return any results.
Russ Beinder is the Chief Technology Officer at FileHold. He is an entrepreneur, a seasoned business analyst, computer technologist and a certified Project Management Professional (PMP). For over 35 years he has used computer technology to help organizations solve business problems.