Custom reporting using workflow data
FileHold workflows generally operate without management intervention, but things happen in real life and the standard workflow reporting and dashboard provide methods to deal with departments that are overworked or employees that unexpectedly went on leave. Beyond that, organizations may have a variety of custom analytics they want to run on workflow data to identify trends in performance, etc.
Workflow’s automatic reminders and due date monitoring let users know when they are behind, but do you have processes in place to make sure they are able to complete their tasks when you need them to? Monitoring the start and end dates of a workflow can help with this process, but what happens when you allow users to restart workflows to correct issues? The start date of the completed workflow may not be the real start date when you account for corrections that users were given the flexibility to solve.
The “allow postpone” attribute of a workflow is the starting point to handle errors. For example, a workflow participant tries to complete their task, but notices the workflow initiator or prior participant failed to include an important piece of information. They may send a user a one-time task, they may send the workflow back to an initiator or a prior activity. In all cases, when the problem is corrected, the workflow is restarted from the beginning or from the current or earlier workflow activity.
Fair warning, computer code follows. It is best read by individuals that have seen SQL database code before.
The Workflows table in the ch_librarymanager database includes a record for every workflow in the system. An automatically or manually restarted workflow adds a new record. For tracking the true start and end dates, this poses a problem. The true start date is not in the “current” or last instance of the workflow, but the first. Of course, the end date in is the first. To further complicate things, depending how often the workflow was restarted, there may be many related records in between the first and last.
The good news is that there is a path from the current workflow to the first workflow via the RestartedWorkflowInstanceGuid column. When a workflow is restarted, its status is changed to restarted and the new instance that is created is given a copy of its instance GUID. This means the current workflow has the instance GUID of the workflow it was restarted from. Now I just need a way to query through the RestartedWorkflowInstanceGuid column until I find the first related workflow record.
SQL common table expressions (CTE) have been around for a while and they have a nice way to handle recursion. You just need to create an anchor case and the general case and join them with something like a UNION ALL. The following query returns all related records for the given current workflow. I have added a column so we can easily distinguish which workflows were restarted. We do not need to check the workflow status in this case as anything that is not the anchor is restarted by definition.
with FullWorkflows (IsRestarted, WorkflowId, ParentWorkflowId, AnchorWorkflowId, WorkflowName, StartDate, EndDate)
as
(
select 0
,Anchor.WorkflowInstanceGuid
,Anchor.RestartedWorkflowInstanceGuid
,Anchor.WorkflowInstanceGuid
,Anchor.WorkflowName
,Anchor.DateCreated
,Anchor.DateCompleted
from ch_librarymanager.dbo.Workflows Anchor
where Anchor.WorkflowInstanceGuid = 'DB8C26AE-5BF7-47E7-84BA-00296E91CC9D' -- pretend this is a workflow in your database
union all
select 1
,Child.WorkflowInstanceGuid
,Child.RestartedWorkflowInstanceGuid
,Parent.AnchorWorkflowId
,Child.WorkflowName, Child.DateCreated, Child.DateCompleted
from FullWorkflows Parent, ch_librarymanager.dbo.Workflows Child
where Parent.ParentWorkflowId = Child.WorkflowInstanceGuid
)
select *
from FullWorkflows
Now that we know how to find all the related records, we need to extract the specific details we need to find out our true start and end dates; we simply need the first and last workflow instance records if we assume they are sorted by date. TOP 1 and a couple of opposite ORDER clauses will do the trick.
select FirstInstance.AnchorWorkflowId, FirstInstance.WorkflowName, FirstInstance.StartDate, LastInstance.EndDate, FirstInstance.IsRestarted
from
( select top 1 * from FullWorkflows order by StartDate) FirstInstance,
( select top 1 * from FullWorkflows order by StartDate desc ) LastInstance
What we have so far answers our question for a specific case, but we will need a bit more code to handle the general case as we most likely would like this reporting to cover a range of current workflows. To find these workflows we will need to look for instances that do not have the WorkflowStatus equal to 5 (restarted). Note that we could also look for ones that are specifically canceled, in progress, competed, etc. This is a more straight forward query than above.
with RootWorkflows (WorkflowId, WorkflowName, RestartedId )
as
(
select w.WorkflowInstanceGuid, WorkflowName, RestartedWorkflowInstanceGuid
from ch_librarymanager.dbo.Workflows w
where WorkflowStatus <> 5 -- restarted
and IsOldRestarted = 0
)
select *
from RootWorkflows
Note that we have excluded “old” restarted workflows. These workflows were created with early versions of FileHold 14, most of which are probably no longer relevant and require a different method to connect the dots.
Now we just need a way to connect the list of current workflows with our method to find the first and last instance for each workflow. One method is to convert the first query to a table function and join it with our list of current workflows. We will use the @WorkflowGuid variable in place of our fixed GUID.
CREATE function [dbo].[zftFullWorkflow]
(
@WorkflowGuid uniqueidentifier
)
returns table
as return
with FullWorkflows (IsRestarted, WorkflowId, ParentWorkflowId, AnchorWorkflowId, WorkflowName, StartDate, EndDate)
as
(
select 0
,Anchor.WorkflowInstanceGuid
,Anchor.RestartedWorkflowInstanceGuid
,Anchor.WorkflowInstanceGuid
,Anchor.WorkflowName
,Anchor.DateCreated
,Anchor.DateCompleted
from Workflows Anchor
where Anchor.WorkflowInstanceGuid = @WorkflowGuid
union all
select 1
,Child.WorkflowInstanceGuid
,Child.RestartedWorkflowInstanceGuid
,Parent.AnchorWorkflowId
,Child.WorkflowName, Child.DateCreated, Child.DateCompleted
from FullWorkflows Parent, Workflows Child
where Parent.ParentWorkflowId = Child.WorkflowInstanceGuid
)
select FirstInstance.AnchorWorkflowId, FirstInstance.WorkflowName, FirstInstance.StartDate, LastInstance.EndDate, FirstInstance.IsRestarted
from
( select top 1 * from FullWorkflows order by StartDate) FirstInstance,
( select top 1 * from FullWorkflows order by StartDate desc ) LastInstance
With this function in place, we can update our list of current workflows to include the start dates from the first (or only) workflow instance and combine it with the current.
with RootWorkflows (WorkflowId, WorkflowName, RestartedId)
as
(
select w.WorkflowInstanceGuid, WorkflowName, RestartedWorkflowInstanceGuid
from ch_librarymanager.dbo.Workflows w
where WorkflowStatus <> 5 -- restarted
and IsOldRestarted = 0
)
select FullWorkflow.AnchorWorkflowId, FullWorkflow.WorkflowName, FullWorkflow.StartDate, FullWorkflow.EndDate, FullWorkflow.IsRestarted
from RootWorkflows
cross apply zftFullWorkflow( RootWorkflows.WorkflowId ) FullWorkflow
That’s it. Of course, there may be a variety of other scenarios where you need one or all of the restarted workflows in a single workflow chain and this technique can be expanded as needed. Beware there is a limit to how deep your recursion can go, however, the SQL default is 100 rows and that is almost certain to cover all FileHold workflow cases.
Russ Beinder is the Chief Technology Officer at FileHold. He is an entrepreneur, a seasoned business analyst, computer technologist and a certified Project Management Professional (PMP). For over 35 years he has used computer technology to help organizations solve business problems.