OCR and FileHold – Transforming digital documents part 3 of 3

Optical Character Recognition, or OCR, is a powerful and newly-affordable tool to convert scanned images into fully searchable text. In the last articles, we introduced you to OCR, and FileHold’s Server-Side OCR module. In this article, we will explore and how it will transform your digital documents.

What is Zonal OCR?

As the name suggests, Zonal OCR applies the OCR process to specific areas of the document. This can mean either the entire document is run through OCR and text can be selected, or it can be targeted to specific areas to only OCR those zones. Areas can be manually selected, or with suggestions from the software. Unlike Server-Side OCR, this happens prior to the document being added to FileHold, which means the OCR text can be utilized as document tags.

What is document tagging?

Document tagging is a vital part of any DMS system. It allows critical information, or metadata, to be attached to any kind of document. Where the document has no searchable text layer, such as a video, audio file, or drawing, metadata allows a document to be searched and retrieved.

Documents in FileHold are indexed first by their schema, or use type, and then with metadata related to that schema. For example, an invoice would be filed under the Invoice schema, and then have the Vendor Name, Date, Total, and Terms of Payment used as fields for the document metadata. In FileHold, these are configurable based on your use case. Are the Terms of Payment not important, but a PO Number is? You can easily create your schema and metadata suited to your needs.

On electronic forms, this information is easily found; the vendor name, invoice number, date, etc. are all clearly laid out, and the document can be indexed by the user with simple copy-and-paste commands. Note that FileHold offers the ability to process e-forms and automatically extract these text fields automatically into the assigned metadata; but when the form originates from outside the organization, e-forms are not likely to be an option. Forms that start as paper and are scanned into a digital document offer a new challenge – exact data entry. Since there is no text to select, metadata entry is entirely down to the user’s data entry skills. This opens the possibility of transposition errors and incorrect metadata.

How does FileHold use zonal OCR for document tagging?

Most forms that arrive regularly are of a standardized format. Consider an invoice: the vendor logo, date, invoice number, and other information remain in the same place every month, or every form. Zonal OCR can look for this information in specific areas – and more importantly, can be taught to increase accuracy with each new invoice processed. This increases document efficiency for processing scanned forms. For instance, you have a scanned invoice from Vendor A, with their standardized formatting. Zonal OCR software knows to look here for the invoice number, there for the total, and at the top for the date, and populates the metadata fields as needed. Vendor B’s invoices have different formatting, but the OCR software knows these different locations and grabs the metadata information just as easily.

FileHold offers our partner software, SmartSoft Capture, for use as a zonal OCR tool. The first step when a document is added is for SmartSoft to apply an OCR “crawl” of the entire document. Then, it uses clues to search for text beside known text – ie, if it reads “Total” from the text, and the values immediately beside that are numbers, this is likely to be the required field for Total. It offers a zone for verification by the user to confirm the assumed value is not only from the correct location but also been read accurately from the OCR. If so, the user validates the entry; if not, the user can select the area where the information is to be found, or can correct the OCR values. Once the fields are confirmed to be correct, the document can be exported into FileHold. The validated fields become the document metadata. In one step, this saves time locating information and minimizes data entry and transposition errors.

As you use SmartSoft, it becomes more accurate. It can be taught to remember locations for information on forms from regular vendors so when it sees a new form from them, it knows where to find the information. If these field locations are refined by the user, SmartSoft will keep that in mind as well. Pages can even be re-ordered, grouped, or removed to ensure high-quality documents are added to FileHold. It can also read barcodes – either as page separators in batches of documents, or as readable values for metadata.

Zonal OCR in many ways acts like an eye, looking for information and making choices as to what is best – and gets better and more accurate with use.

Is there a standard feature that uses zonal OCR for occasional metadata tagging?

If you have a “one-off” document, FileHold offers the Click to Tag feature. This lets you use your mouse to select areas of the document and look for metadata for the field. Click to Tag can be used in the native application or through the FileHold viewer, where you will be able to select an area to match to a metadata value. The text of the document is applied to that metadata field. There is a confirmation step as the final part of completing the field, to ensure OCR accuracy prior to tagging the document. You can then exit Click to Tag, or select the next field from the document if required.

What are the use cases for Zonal OCR?

Zonal OCR is a very efficient way to perform two services at the same time when processing batches of documents into FileHold: metadata tagging and making flat images into searchable text. This is a great asset for any organization that deals with regular unwanted paper.

  • Paper documents that arrive on a regular basis, like invoices or bank statements, can be scanned and processed into searchable documents with correct metadata in batches using Zonal OCR. Because they are verified for accuracy prior to being added to FileHold, they can then be entered into workflow for internal approvals without concern of misinformation.
  • Batches of documents can be scanned and run through at the same time. Instead of needing to deal with paper conversion to digital every time a new document arrives, they can be stacked, filed with a separator page, and batch processed to allow users to focus on the task and file documents efficiently into FileHold for company wide use.
  • Scans from third-parties, such as scans of forms, can be processed with an easy-to-use tool to make sure the information is correct prior to filing and the metadata is ready to be searched.

With all these tools, it is no wonder OCR has been a vital tool in the transformation of offices into truly paperless environments. To begin your discovery process about how FileHold’s OCR modules can help make your documents more efficient and accurate, contact your sales consultant at [email protected] to get started.

Chris Oliver

Chris Oliver brings his twenty years of experience in management in the entertainment industry to FileHold Systems as the Client Training and Retention Advocate. To learn more about how FileHold DMS can work for you, contact him at [email protected].