1.877.833.1202

Designing a FileHold library

Thursday, June 4, 2015

There are four basic considerations when designing a FileHold library. 

  1. How users will search for documents.
  2. How users will add documents.
  3. What sorts of access controls are needed on documents.
  4. Performance.

Searching

In most cases FileHold is not simply a dumping ground for documents. It is a place where you go to find the important documents of your organization. Sometimes these documents will be found using the document contents, but most often they will be found using the information that was included with the documents when they were added to the system: the metadata. You want this information to be complete and easy to collect. If, when users add documents, the information is readily available and unambiguous they are more likely to enter it correctly. Where possible use automation such as property extraction rules to capture information with no extra user effort.

Come up with a scheme for classifying documents. This can be useful to ensure the information the user is asked to enter is specific to each class of documents and no irrelevant information is requested. These document classifications might be mapped directly to a document schema. If a number of document classifications vary only by their classification it might be best to group them together under a single document schema and include a document classification metadata field to identify the differences.

Adding

When a document is added to FileHold it is necessary to enter both metadata and select a destination in the library. How the documents are added is important in determining the best method to select the destination. Where volume is low and the risk of error small it might make sense to drag-and-drop the documents or directly select the library location. In this scenario, auto-tagging may be used to set certain metadata values. When there is greater volume or risk the documents should be automatically filed based on their metadata. This way the user is only required to enter the metadata and auto-filing rules handle selecting the destination. The rules can be configured into one of the provided auto-filing scripts or a custom auto-filing script could be created to handle complex filing rules. The rules should take into account access control and performance considerations. When documents are scanned into the system using Quick Scan Pro or similar software there is normally metadata and auto-filing involved as part of the scanning workflow.

Access controls

In FileHold documents are always stored inside of folders. Folders are contained in drawers and drawers are contained in cabinets. There is also a method to group folders together within a drawer. Visually this is represented as a three or four level library hierarchy. Access can be controlled to a cabinet and further controlled to a folder. A user must first be able to access a cabinet before they will be able to access a folder. Access to drawers and folder groups is open to every user that has access to the cabinet.

Many customers have very simple access requirements. Documents are often available to everyone in a single organizational unit. For example, a human resources department may have documents that are hidden from most users while the engineering department documents can be read by anyone. In this case access controls can often be defined at the cabinet level.

There are more complex scenarios where a folder may represent specific client data that should only be visible to a client manager and possibly the client themselves. In this context all client managers and all clients would be members of the cabinet, but the documents in their folder or folders would only be accessible to a much more limited group. There are many additional variations on these scenarios according to user roles in the system. For example, a client manager may be able to add new documents to a client folder, the client could only read the documents in their folder, and a supervisor could delete documents in a client folder.

The key to being successful with access controls is to include the minimum segregation required to meet your goals. With the segregation in place the user roles can be applied for the different types of access possible.

There is one additional dimension to access control: the document schema. Users will only see documents for schemas where they are a member. This makes it possible to have a single human resources folder that contains internal and external process documents. Human resources users would be members of both document schemas and they would be able to add or view both internal and external process documents. The rest of the organization would be members of the external process documents and they would never be able to see the internal process documents even when they are assigned access to the process documents folder. Further, the external process document schema could hide those documents until they are approved allowing the human resources team to prepare the documents, but not expose them to the rest of the organization until they are ready and approved with a document workflow.

Performance

There are a few basic aspects of performance that should be considered when determining the library structure.

The most simple implementation of a library structure for FileHold is a single cabinet containing a single drawer containing a single folder. If documents in the system would be equally accessible to all users this would be all that was required. From a performance perspective this may not be desirable as each time someone opens the folder every document in the system would be retrieved as long as they were a member of its schema. Opening a folder is essentially a search for all documents in the folder. Unlike a typical search in FileHold, there is no limit on the maximum number of documents that can be returned. This would not be a significant issue if the total documents in the folder numbered in the few thousands, but a folder with a million documents would put a tremendous, and almost certainly unnecessary, load on SQL server.

A second consideration is the calculation of permissions as users expand lists of cabinets or folders. It is necessary for the system to determine if the user should be able to see the folder according to the permissions that were defined. This is not a large burden when the folders number in the hundreds, but ten thousand folders would cause a significant burden on SQL server. Breaking the folders into drawers or folder groups can help to reduce the number of folders that will be visible at one time and thus reduce the cost of unnecessarily displaying folders. Additionally there is some control over the maximum number of folders that will remain open after a user logs offs. Keeping this number low can help to improve the login speed.

Finally, there is a question of network bandwidth. Each element of the library structure must be transferred across the network between the application server and the client. There is a significant amount of data contained in a single library object and thousands of library objects can amount to large data transfer. This situation most impacts the desktop client as there is typically a slow network link between the application server and the client. For the web client this connection is usually within the same server, so it is less of a issue.

Advanced library design

The four considerations above will cover most solution designs, but more complex systems might also consider topics like custom naming and document lifecycle.

profile picture russ beinder Russ Beinder is the Director of Product Development and Professional Services at FileHold. He is an entrepreneur, a seasoned business analyst, computer technologist and a certified Project Management Professional (PMP). For the last 25 years he has used computer technology to help organizations solve business problems.