Document Scanning / Data Capture Software

Document scanning and text capture software from FileHold empowers organizations to make the paperless office a reality. Included with the FileHold software purchase is intelligent Capture technology that can convert mountains of paper into a secure, manageable electronic archive. Electronic documents in FileHold are organized, searchable and usable from anywhere in the world, even on mobile devices.

Studies find that 35-50% of company information is not centrally indexed or searchable. Other surveys put this figure at as high as 80%". - IDC White Paper.

SmartSoft Capture - An "Out of the Box" Intelligent Data Capture Scanning Solution

Every new install of FileHold software ships with a license to SmartSoft Capture document scanning software. For an introduction and short video tour of Capture go to Capture video tour. This software is provided at no additional cost with the purchase of FileHold software and is configured to work only and directly with FileHold. Capture exceeds expectations at automating text capture and data extraction and will move captured text from documents and forms directly into FileHold metadata and associate that metadata directly with the document being scanned. For a summary of how to install and use Capture go to the Capture Quickstart Guide.

For those customers who find that the text capture and extraction is not accurate enough, possibly due to the quality of their documents, you can upgrade to Capture Plus which uses a primary enhanced OCR engine (Nuance) and a secondary OCR engine (Tesseract) for maximum recognition quality. If you wish to upgrade your current version of Capture to Capture Plus, contact sales@filehold.com.

Capture data extraction software is easy to configure, use and maintain. A user can “point-and-click” on key data fields in a document or form for extraction and in many cases Capture will learn to do it automatically enhancing production level document processing. Capture is an intelligent document scanning solution to capture text fields and is especially useful in capturing variable data on repetitive forms such as invoices or surveys. The Capture software will work with nearly all document scanners that have a TWAIN interface.

FileHold software and Capture have been extensively tested and certified for use with inexpensive but powerful scanners such as the Fujitsu fi Series scanner.

Upgrade to SmartSoft Pro OCR for Forms Processing

SmartSoft Pro is an optional scanning solution that can be purchased from FileHold. Pro is for companies who processes large volumes of paper based forms, such as invoices and are looking for a way to automate the capture of text on that form including detail such as line items. There are multiple mechanisms for exporting the captured data to FileHold, to other database programs and even directly to accounting systems. The Pro version forms processing is affordable, reliable and easy to operate. SmartSoft Form Recognition technology uses the SmartOCR recognition engine combined with the flexibility of SmartTemplates dynamic templates technology to extract data fields from scanned and PDF forms. SmartSoft's advanced text recognition technology automates forms data capture reducing the costs and time needed to transform machine readable text into manageable data.

Additional features of SmartSoft Pro include:

  • Dynamic form recognition technology: The software is able to recognize the correct layout of a scanned form. If the document is rotated, offset or zoomed when a sheet of paper is scanned, the software is still able to recognize the right template.
  • Data Validation: SmartTemplates recognizes field data types such as date, currency etc. so that validation rules can be set up to increase accuracy. There is a custom dictionary feature where users can add specific terms and names which increases the processing reliability.
  • Intelligent accuracy: For each extracted character the software calculates a confidence level to automatically assess accuracy. For documents that have poor quality such as a character that is damaged or blurred an operator intervention is automatically requested. Manual verification of data capture can be set up through convenient user interface, to further reduce the error rate.
  • Input: Both SmartSoft Capture and Pro support direct scanner connection through any scanner, both TWAIN or WIA. The documents can also be processed when already in electronic form such as PDF or image format.
  • Output: The extracted data can be saved in a variety of output formats like XML or stored in a database. With additional Professional Service fees from SmartSoft FileHold the output can be sent to popular accounting and ERP software packages such as QuickBooks (the software must be installed on a computer that has QB on it), SAP, MS Dynamics and SQL.


WebCap (in preview release): Web-based scanning - remote users can scan documents via a web browser.

FileHold WebCap web-based scanning  

Introduced as a preview feature in FileHold.

This feature appeals to organizations that have many remote offices or many mobile workers who need to scan documents such as contracts, bill of sales, expenses, or work estimates into a document management system. It is especially powerful for companies who want these documents to go into a document workflow to automate review and approval processes. Remote workers can scan and store documents into the FileHold repository from anywhere they have access to the internet.

This is an optional feature that is great value when compared to the cost of providing desktop scanners to many locations not to mention there is no cost to support a local PC footprint. All users have the same software regardless of what TWAIN compatible scanner they have and the scanning function is seamlessly integrated with the rest of the FileHold user interface. Documents can be stored in FileHold in TIFF or PDF format. Internet Explorer, Firefox, and Chrome browsers are all supported.

For more information on WebCap go to:  https://www.filehold.com/features/webcap

Third Party Document Scanning Software Support

Other 3rd party Document Scanning products are supported (but not certified) by a connector called the FileScan Bridge. In all cases some additional Consulting services from FileHold may be required.

Schedule a no obligation demonstration today!

Document Scanning Software Jargon

FileHold Software supports any TWAIN document scanner "out of the box" from the industry leading vendors. Learn more about document scanning. This technical glossary of terms, is a resource for organizations who may want to add scanning to their document management software. This may be especially helpful for those considering the conversion of an existing archive of physical records into an electronic record repository. Learn more about how to own records management software.

Anti-aliasing - A process used to remove the stair stepping effect found in diagonal lines of an image. It involves inserting dots of an in-between tone along the edges.

Aspect Ratio - The relative proportion of the length and width of an image. For example, if you scan an original that measures 4 by 6 inches, it will have an aspect ratio of 4:6, or 2:3.

Attribute - Characteristics of a page or character, such as underlining, boldface, or font that can be captured by an optical character recognition (OCR) program.

Automatic Document Feeder (ADF) - A device attached to a scanner that automatically feeds in one page at a time, allowing the scanning of multiple pages.

Auto Trace - A feature found in many object-oriented image editing programs, such as Adobe Illustrator, that allows you to trace a scanned image and convert it to an outline or vector format.

Batch - Actions carried out consecutively on a set of files.

Binary - Base-two arithmetic, which uses only 1's and 0's to represent numbers. 0001 represents 1 decimal, 0010 represents 2 decimal and so forth. Binary numbers are used indirectly to refer to color depth, as in 24-bit or 8-bit color.

Bit - The abbreviation for binary digit, either 0 or a 1. Scanners typically use multiple bits to represent information about each pixel of an image.

Bit Depth - The number of bits used to represent colors or tones.

Bitmap - An image represented as pixels in a row and column format. (Note that Adobe refers to a bitmap as a two-color image.

Calibration - A way of correcting for the variation in output of a device such as a printer or monitor when compared to the original image data from the scanner.

Carriage - The scanner component that moves down a page to capture an image.

CMYK - The abbreviation for cyan, magenta, yellow, and black.

Compression - Squeezing a file (especially an image) into a more efficient form to reduce the amount of storage space required.

Contrast - The range between the lightest and darkest tones in an image. In a high-contrast image, the shades fall at the extremes of the range between white and black. In a low contrast image, the tones are closer together.

Data Compression - A method of reducing the size of files, such as image files, by representing the sets of binary numbers in the file with shorter string that conveys the same information. Many image editing programs offer some sort of image compression as an optical mode when saving a file to disk.

Digitize - To convert analog information, such as a continuous tone image, to a binary form that can be processed by a computer.

Dot - A unit used to represent the smallest element a printer can image, but sometimes used to represent the resolution of other devices, such as monitors or scanners.

Dots Per Inch (DPI) - The resolution of a printed page, expressed in the number of printer dots in an inch, abbreviated dpi. Scanner resolution is also expressed, somewhat in accurately in dpi.

Down sampling - To reduce the amount of information in an image, usually to make it smaller or to discard some colors when changing bit depth. Also used when reducing the number of pixels in an image.

Dynamic Range - The range of densities between the highlights and shadows of an image.

Export - To transfer an image to another format type.

Filter - An image transform tool used to process an image; for example, to sharpen, blur, or diffuse it. Often this is a plug-in in an image editor, but filters are also built into scanning software or hardware.

Gamma - A way of representing the contrast of an image, shown as the slope of a curve showing tones from white to black.

Gamma Correction or Gamma Compensation - The process of preconditioning or adjusting an image to correct for the gamma of the device used to reproduce the image, such as a printer or display screen. Without gamma compensation, the image will look too dark when printed or displayed.

Gang Scan - The process of scanning more than one picture at a time, used when images are of the same density and color balance range.

Graphics Interchange Format (GIF) - A compressed image format popular on the Web. GIF was the first commonly used image format, but was largely replaced by JPEG.

Grayscale - Gray values in an image.

Halftoning - A method of representing the gray tones of an image by varying the size of the dots used to show the image.

Interpolation - A method of changing the size, resolution, or colors in an image by calculating the pixels used to represent the new image from the old ones. It is also being used to increase bit-depth claims on scanners (as in "Enhanced Bit Depth" or "Enhanced Color").

Invert - To reverse an image's tones to its opposite value: to make a negative.

Joint Photographic Experts Group (JPEG) - The JPEG format offers a compression scheme that makes the image file smaller than files in other formats by discarding some of the image information.

Landscape - The orientation of a page in which the longest dimension is horizontal.

Legal size - Paper or other media that is 8 1/2 inches wide and 14 inches long.

Moire - In scanning, an objectionable pattern caused by interference of halftone screens, often produced when rescanning a halftone and the sampling frequency of the scanner (spi) interferes with the halftone or dither pattern of the original.

Monochrome - Having a single color. Typically refers to a black and white image, but could be any single color image.

Noise - Random information that distorts an image, especially the background distortion of an analog image before it is converted to digital format.

Optical Character Recognition (OCR) - The process of converting printed characters into the ASCII characters and other attributes of a bitmapped image of text.

Optical Resolution - The resolution of a scanner that is calculated by dividing the width of the scanned area by the number of pixels in the CCD. Optical resolution is also often called true resolution and does not include any interpolation to increase pixels.

Pixel - A picture element of an image that refers to a single dot with in a digital photograph. A photograph is made up of thousands of pixels.

Pixels Per Inch (ppi) - The number of pixels captured per inch by a scanner. This is a more accurate rate term than dpi (dots per inch) when applied to scanners because scanners capture pixels.

Portable Network Graphics (PNG) - A loss less file format created to overcome deficiencies of the Graphics Interchange Format (GIF), such as the limited number of colors.

Portrait - The orientation of a page in which the longest dimension is vertical.

Preview Scan - A preliminary scan that can be used to define the exact area for the final scan. A low- resolution image of the full page or scanning area as shown, and a frame of some type is used to specify the area to be included in the final scan.

Raster Image - An image defined by rows and columns of pixels. Scanners capture images as raster images, although some can convert them to vector images.

Raster to Vector Conversion - The process of examining a raster image for lines and strokes, and creating a new image that looks the same but is made up of lines rather than pixels. When a person draws, they are creating a vector image. Vector images can be enlarged much more accurately and often have a smaller file size.

Resolution - The number of pixels or dots per inch in an image. Also the capability of a scanner to resolve detail, which requires quality optics as well as high ppi or spi.

Sample Rate or Samples Per Inch - The number of pixels per inch captured by a scanner.

Scanner - A device that captures images or text and converts it to a bitmapped image.

Selection Area - The part of a HP Deskscan preview scan that you select to be saved to a file or sent directly to a printer.

Sharpening - Increasing the apparent sharpness of an image by increasing the contrast between the adjacent tones or colors.

Smoothing - To blur the boundaries between tones of an image, usually to reduce a rough or jagged appearance.

Threshold - A predefined level used by scanners to determine whether a pixel will be represented as black or white.

Thumbnail - A miniature copy of a page or image, which gives you an idea of what the original looks like without having to open the original file or view the full size image.

Tagged Image File Format (TIFF) - A graphic file format originally developed specifically for scanners. It can be used to store grayscale and color images and now is graphic standard image file format supported by most applications, printers, and scanners.

Transparency Adapter - An add-on device used with a scanner to scan slides and other see-through media.

TWAIN - A software driver interface between a scanner and other image capturing devices that lets you scan images from a scanning application directly into an application like Adobe Photoshop.

Vector Image - An image defined by the beginning and ending points of each line.

Zoom - To enlarge a portion of an image.