The OCR Pack provides tools to extract data from scanned images. Branches can transform, report, validate, and store the extracted data.
Once a document has been scanned and converted into a TIFF, the OCR Pack can accept those images for processing. The method used to process scanned images depends on whether the image contains a single document or a batch of multiple documents.
To process an image containing more than one discrete document, the following processing steps are available:
Batch Merge | Scanning equipment and the documents they process are not perfect. Pages can jam the scanner, resulting in a scanned document that is incomplete. When this happens, you must combine multiple scanned images to construct a complete document batch. This is accomplished by selecting the individual, incomplete scanned documents within Transform Content Center and submitting them to a remote branch. This branch merges the documents into a single multi-document image. Typically, this process submits the newly-created image for bursting and removes the original, incomplete documents from Transform Content Center storage. |
Bursting | The OCR engine scans each page within a multi-document image under the control of a branch and identifies the first page of each document. It stores this information about how the batch is to be split into its component documents in a Scanned Batch (.fsbd) file. |
Burst Validation |
Since initial document scanning is performed by people and bursting is done under program control, Burst Validation provides an opportunity for a person to review the results of both the scanning and bursting operations. This individual can make changes to the order in which the pages appear and to the page attributes assigned during the bursting operation. This might be necessary under circumstances such as the following:
In environments where accuracy is paramount, Burst Validation makes certain that only properly validated and reviewed documents are submitted to subsequent document processing steps. Once a batch of documents has been burst and validated, the individual documents are extracted and can be submitted for further OCR processing. The status of the batch file is set to 2 (Finished) and its automatic deletion date is set for 14 days later. |
Document Storage | The special batch document format used to handle scanned document batches is a format known to the Transform Content Center application, and so the batch documents can be stored. The page attribute information is associated with the scanned document pages, and so this information remains available for subsequent processing, should it be needed at a later time. |
If individual documents are scanned or when batches of documents have been burst, the OCR Pack makes available the following processing steps:
© Copyright 2001-2018 Bottomline Technologies, Inc. All rights reserved.