Creating scan profiles enables users to switch quickly from scanning one type of document to another. Each profile represents one type of document and has all of the settings unique to that document type.
Facts about QuickScanPro Scan Profiles
These profile scan settings once configured control the OCR (optical character recognition) settings and the export functions of the files that EMC Captiva QuickScanPro (QSP) processes for the document management system.
- When a profile is selected, a description of the profile will appear in the box below the list. When a user creates a profile, they will have to provide a description in order for one to appear when the profile is selected.
- Export profiles sends the file metadata to the local drive XML folder that is accompanied by the PDF output folder that was targeted on the general tab for file output.
- OCR and indexing provide for the automatic capture of searchable metadata from the files as they are processed by QSP.
In order for OCR to work to its best capability of around 80% accurate, the documents to be OCR's must be standardized forms with no handwriting.
- Anomalies such as documents being printed from different printers, the type of print (i.e. dot matrix) and overall print quality can effect the accuracy of the OCR.
Batch Scan Settings
Click on the different menu items to access the settings.
General: Contains the profile name, description, profile type (public or private) and whether to allow profile deletion. It also shows where the document images are saved. Finally it contains a summary of the settings for the entire profile.
Scan: Sets the options for how the software will scan the document. The settings include things like simplex/duplex, dpi, size if paper and whether and the scanner settings.
Image Format and Naming: Configure file naming scheme, file type, colour format and compression. File names can be formatted to include information like the date and also to follow a certain formula.
Image Processing: Sets up image cleanup. As the pages are scanned actions like skewing, noise removal and having blank pages detected and removed can be automatically preformed.
OCR: You can set up Zonal and Full Text OCR on this tab. Optical Character Recognition is the process where characters and numbers are converted from electronic image (scanned document) to electronic data so the text becomes searchable in FileHold. The QSP scanning software application reads the black and white pixels on an image and attempts to recognize the correct alpha character or numeric number, where they reside. Full Text OCR, Zonal OCR, bar codes, manual index fields, etc are all capable of being imported into FileHold using the Manage Imports tool via an XML output file from QSP. If you wish to setup full text OCR for a given QSP batch, as a best practice, we recommend you create a folder where both the XML export file and the OCR'd PDF are placed upon completion of each batch.
IMPORTANT: XML files must be in the same directory as the PDF file(s) being imported from QuickScan Pro, Kofax Capture, Kofax Express or Kodak Capture Pro!
Determine when to use Zonal and Full Text OCR from the table at the bottom of the page.
Index: This tab allows for zonal OCR capture of fields captured from a reference file (TIFF format). This is great for structured forms where the information for each index field (Invoice #, Customer Name) is in the same location on the form so that you scan in batches and automate the capture of these fixed location fields.
Invoices are a perfect example. Zonal OCR can be set to capture information such as the Invoice Number, the Customer Name and the date automatically. Users can create multiple index profiles for each type of document\form so that specific things happen when this batch launched in QuickScan Pro.
- Stamp scanned image - leaves a stamp on the image to indicate scanned status.
- Auto Index - recommended that you check "before the batch is closed". This allows the individual pages / files to be reviewed to make sure that they are being scanned correctly.
Profile - these profiles control the fields that will be indexed as the files are scanned.
Export: Set the export path for the image files once they are scanned. Users can create multiple export profiles.
- Auto Export - just like the indexing should happen before the batch closes.
Profile - this profile controls where the files are exported to on the local drive* and what (if any metadata) is associated from Indexing. *Files are exported to a set of folders created on a local drive. From there they are moved automatically into the document management system by an Import Tool that is configured in FDA. See Importing QSP processed files into the document management system for more details.
To export out non full text OCR'd documents - include from the <Page Values> drop down - the Page: FileName, and then include the various index/metadata fields from the Indexer profile you created for this batch profile.
REMINDER: XML files must be in the same directory as the PDF file(s) being imported from QuickScan Pro, Kofax Capture, Kofax Express or Kodak Capture Pro!
To export out full text OCR'd documents - you don't include the Page: File Name like you do with non full text OCR results, instead you check the include OCR results - with an OCR count of 1.
Batch: We recommend a basic configuration in the batch tab so that the batch function is left to save itself without prompting the user and slowing down your work.
Make sure your XML export file setting for If output file exists:
- Generate new output file name
- Reminder - make sure XML file is always in same folder as the PDF files that will be imported into FileHold.
When to Use Zonal OCR or Full Text OCR for Document Scanning
Review the follow table to determine when you should use the different types of OCR capabilities.
|Zonal OCR||Full OCR|
|Zonal OCR (Optical Character Recognition) is similar in that it captures information, but in this case, the software is programmed to look in the same location or “zone” for every scan. This is helpful when scanning in documents where the information is in the exact same location on every page, such as an invoice number. Once the document has been scanned, the operator can validate this information to ensure that the information captured from the zone is correct or not.||Full Text OCR (Optical Character Recognition) is software that captures every character in the document being scanned and processes it into a fully searchable PDF. One effective use of this technology is when users need to search hundreds of pages of an article or book for a certain topic. The process of Full Text OCR can significantly slow down the scanning process depending on the type of scanner, RAM, HD space, and video card that is being used.|
There is also another type of character recognition, called ICR. ICR (Intelligent Character Recognition) is the ability for the software to read hand-written information and process this into searchable information. This is especially beneficial in the financial industry. This tool can be very useful in some situations, however, the error rate is much higher because handwriting is so varied from person to person. EMC Captiva QuickScan Pro does not have ICR capabilities.