Document Scanning: Assessing your Needs

FileHold has helped thousands of users get away from their paper documents and into a fully digital platform to organize, search, and retrieve documents while ensuring they are under retention and backup. FileHold offers different ways to scan, process, and add those documents to the repository, and we want to ensure the best fit for each organization. So, how do you get started assessing scanning needs?

This article suggests a set of evaluation tools that you can use to determine your overall scanning requirements. With these, FileHold can understand your needs and suggest the best possible strategy with the greatest efficiency.

Converting physical media into data offers the “Five V’s of Big Data” approach to identifying the scope of your scanning needs: Volume, Velocity, Variety, Veracity, and Value.

Image
Five V's of scanning

1. Volume

Consider the amount of material that needs to be scanned. Limit this to only the quantity of paper you need to process.

  • What is the scale of the scanning?

Does your organization need to digitize a filing cabinet or a warehouse of paper? Large scanning projects require more consideration due to the overwhelming volume.

  • What resources are available?

Will you be using existing staff or will you be bringing in a special team? The process of scanning a document – removing it from its folder, removing staples and clips, running it through the scanner, and dealing with the remaining paper should not be minimized. Do you have the people to scan these documents? Are they properly equipped and trained? Is there enough time?

Volume is more than the quantity of paper to be scanned, it is also the amount of data each document has, and how important proper cataloging will be. We will revisit this with the other V’s, so keep quantity in mind. The process to handle a one-page invoice and a 500-page report differs depending on the amount of data to process.

2. Velocity

Scanning is not just how many documents there are, but how consistently they arrive.

  • What is the flow of documents to be scanned?

Is this a one-time project clearing paper archives, or will this be an ongoing process with incoming documents? Consider this when selecting a scanning solution: a high-cost, high-volume scanner might be rented to address a short-term project. Some software offers highly effective online versions that are “pay-per-click”, potentially a greater value than purchased software.

  • How stable will the flow of documents be?

The more predictable the flow of documents, the more stable the solution will need to be. Organizations that receive regular deluges of documents will need to address them quickly and efficiently or they will be back-logged. Think of an accounting firm at tax season, where they get all their clients’ documents at once: a lot of work, but highly predictable at the same time each year.

Volume and velocity are connected. A high-volume/low-velocity scenario, such as monthly scanning of all documents for a department, would require a higher-speed solution employed each month. If the organization was to move to more frequent scanning, they can reduce the volume and increase the velocity. In this case, a more standard-speed scanner, like a multi-function workstation, might be ideal. The specific operational needs of the organization must be evaluated thoroughly and adjusted to adapt to more efficient or economical solutions.

3. Variety

You are not just scanning paper: you are scanning documents filled with data and information essential to your operations. Document classifications should be identified before scanning starts. FileHold offers near limitless document types which need to align with your scanning process.

  • What kinds of documents are being scanned?

The type of information, or metadata, that can be extracted from invoices is hugely different from contracts or the minutes of meetings and needs to be respected in processing to ensure that data extraction offers the greatest value to the organization. The greater the detail of data being taken in, the more complex the processing will be.

  • How consistent are your document categories?

Scanning one document type can be a different process than scanning mixed types at once. If you are always running different document types into batch-scanned outputs, these may need to be separated into discrete documents. Sometimes, this is easier to do at the physical scanning stage; other times, it is best done with the scanner output and processing software.

Variety may not necessarily mean each group of documents needs to be broken down into discrete elements. This comes back as the most efficient way to store and retrieve your documents. If you are scanning archives of documents, a granular breakdown of document varieties may not be necessary. Folders could be scanned into single documents which the user could find with ease using an identification number, then browse with the Viewer. The effort to scan and classify the documents should not exceed the value in retrieving that information.

4. Veracity

A scanned document’s accuracy or correctness needs to be considered when processing. There are some different ways to capture the information from the scan, such as optical character recognition (OCR) to extract text, or metadata to catalog, index, and organize document information. If the veracity of document metadata is not important, or if there is no information to index, less sophisticated options are available.

  • How detailed does document information need to be?

Scanning can capture a tremendous amount of text or metadata information, but that may not be necessary. Over-capture of extraneous data can be a burden on your scanning process and should be avoided where possible. Under-processing or being too sparse will not take advantage of functions like text search in FileHold. A balance needs to be found between an efficient process and an optimized document data set.

  • How accurate does the data need to be?

The level of accuracy will require different approaches in scanning. Metadata is used to organize and classify documents and therefore must be accurate. The richer the metadata is for your document, the more intensive the scanning process. A verification step is recommended to ensure metadata accuracy to not accidentally misclassify the metadata-rich documents, such as invoices. Other documents might be data-rich, but metadata sparse: a large volume report will have some minimal metadata to define it, and then could largely be processed through OCR. If an invoice’s metadata for the date is incorrect, it can be problematic; if a field buried in a report has a transposition error in OCR, it might never be found, let alone be an issue. Accuracy comes at a cost, and each organization will need to weigh the specific importance to determine the best approach (see “Value” below).

FileHold can connect with other information resources. For instance, the metadata may exist in another resource, such as a software database. FileHold can link to that and perform a lookup: now, the user only needs a single unique identifier that can be leveraged against that database to complete metadata, like an invoice number or a transaction record. This raises the importance for that unique identifier to be accurate, which can increase the time to process that one field; but also eliminates the need for scanner processing to find the other values in the document, which can be a net reduction of processing time.

5. Value

Only your organization knows the value of your physical records.

  • Is this document essential?

Regulatory obligations, operational efficiencies, and internal policies will determine what needs to be retained and what does not. This will factor into any scanning project: does the document need to be digitized at all? If so, what level of data extraction needs to be performed (scale of veracity) to maintain the organizational value? Documents are often kept long past regulatory obligation or operational use: is the best use of organizational resources to digitize them, or should they just be shredded? Document hoarding is a real problem, so don’t convert useless paper into useless data.

  • What is the cost of paper for this document?

The storage and maintenance of paper archives are a direct organizational cost, and there are savings in eliminating paper via a digital solution. This, in turn, can offset the cost of scanning and processing. Therefore, ask what value there is for these documents in the future. A granular metadata breakdown of archived materials is not cost-effective if they are not likely to be accessed in the future.

For those who are wondering about the cost of paper and paper-based processed, here are a few thoughts:

  • The typical office worker spends an estimated 30% to 40% of their day searching for printed documents while organizations on average spend $20 in labor to file a document, $120 in labor to find a misfiled document, and $220 in labor to reproduce a lost document.
  • Companies lose thousands of documents each year as they’re misplaced or simply filed under the wrong name in a cabinet. 
  • Studies show that each data breach can potentially cost $148 per compromised record.
  • The filing process can cost companies tens of thousands of dollars per year both in terms of salary costs and in terms of employee productivity.
  • One report found that physical document storage ate up roughly 15 percent of total office space. On average, office space in the U.S. runs $285 per square foot.
  • A fire, flood, earthquake, or another disaster may destroy years’ worth of valuable information and take a serious toll on your business
  • According to a Gartner Inc. study, the right document management system reduces potential paper-related costs by 40%.
  • Let’s say your firm has 30 staff with an average salary of $40 per hour and that each person spends 10 minutes per day, a modest estimation, looking for documents. With 30 staff working 22 days per month, the firm is wasting 6,600 minutes/month or 110 hours on this highly administrative task.

In the process of finding a physical document, the user leaves their workstation, goes to the storage space, identifies the correct container, and thumbs through folders to locate what is needed. This step of browsing a folder can be easily replicated in digital storage, which saves the time to breakdown granular documents with meticulous metadata by batch scanning the folder. The user finds the document via the physical filing system metadata, and then browses it with the FileHold Viewer from their workstation. The document can be searched for and retrieved quickly with no need to refile. This is more efficient and therefore cost-effective than retrieving physical documents, while minimizing the processing investment in digitization.

The Complete Scanning Solution

As the Five V’s help to illuminate, there is no one-size-fits-all solution to document scanning and storage. FileHold works with organizations to help refine their objectives for scanning and propose different physical and software solutions to address those challenges. If you would like to learn more about how FileHold can assist, contact us at [email protected].

 

Image
Chris Oliver

Chris Oliver brings his twenty years of experience in management in the entertainment industry to FileHold Systems as the Client Training and Retention Advocate. To learn more about how FileHold DMS can work for you, contact him at [email protected].