Zonal OCR - What's it good for?

Friday, August 14, 2015

What is zonal OCR?

Zonal OCR is a type of optical character recognition employed by scanning software that allows the software to read specific areas or "zones" of a document. These zones are determined by setting up scanning templates or "batches" inside the scanning software. Zonal OCR works differently for the various brands of scanning software. Some require the zones to be in the exact same location for each scanning job, such as EMC Captiva QuickScan Pro, while others can be taught to look for the zones on different areas of a page, such as ABBYY FlexiCapture.

How does this differ from full OCR?

Full OCR is a type of optical character recognition employed by scanning software that allows the software to read the entire document and places a text layer on top of the resulting PDF document. This text layer allows the content of a document to be searched. Full OCR is best for documents such as reports or contracts where meaningful words and phrases can be searched inside the document management system.

How is zonal OCR useful?

If you have set forms or templates, zonal OCR can help automate the process of populating indexing or "metadata" fields in FileHold. When setting up your batches in the scanning software, the zones that have been defined are read, converted into text, and then automatically populates the defined indexing fields. This reduces the amount of manual labour that is needed to fill out those required values. When importing the scanned documents into FileHold, these OCR'd values can be used to populate your metadata fields of the schema — again reducing the amount of manual labour needed to process documents.

How is Zonal OCR configured in the scanning software?

In this example, EMC Captiva QuickScan Pro scanning software is being used to set up zonal OCR for Work Order documents. The following diagram depicts the process for performing zonal OCR on a work order document and then populating the metadata fields of a work order schema in FileHold.

Zonal OCR for work orders

First, a batch for Work Orders is configured in QuickScan Pro in order to perform zonal OCR on certain areas of the document. The screen shot below is showing all of the indexing fields that are set to be populated through <Zonal OCR> in the batch. The zone is defined by drawing a yellow box around the area that is to be read.

Zonal OCR batch definition in QuickScan Pro

Once the batch has been configured in QuickScan Pro, then scanning the Work Orders can begin. The batch definition for the indexing fields will OCR the configured zones and populate the indexing fields automatically. In the diagram below, a scan was taken for the work order and the indexing fields were automatically read and populated. The areas where the zonal OCR was performed are highlighted on the page.

Zonal OCR scanned document with zonal ocr results

Once the indexed values are given a quick check over to ensure the values captured are correct, the batch can be closed. With Zonal OCR, there is always a certain amount of error that can occur in the process. For example, if the page is skewed the zone may be off or the quality of the original paper document is not in good shape thereby making the text unreadable. It is always recommended that you have a quality control process to verify that the text that was OCR'd is correct.

Once the scanning is completed, the import of the PDF documents and the metadata field values can occur in FileHold. This is set up in the Manage Imports feature in the FileHold Desktop Application. In the import definition, FileHold can continually watch and import the work orders and the metadata field values into the system.

Zonal OCR metadata pane

If you're interested in learning more about scanning with QuickScan Pro, contact sales@filehold.com. One license of level 3 QuickScan Pro software is provided to our customers at no additional cost with the purchase of FileHold document management software. A 30-launch trial can also be downloaded from the EMC Captiva website.