Data Capture Services for Forms: Form Recognition & Data Extraction
Multi-channel Form Input
The first step in the process is document input i.e. we need to obtain your forms in order to process them. This can be accomplished in multiple ways. We offer a number of flexible options to choose from. You decide which option suits you best:
- Pick-up: We can pick up paper forms directly from your location at agreed-upon intervals (e.g. weekly, monthly, etc.), scan them at our scanning facility & process them
- P.O. Box: A dedicated P.O. Box can be set up for your suppliers or customers to send their documents to (e.g. invoices); we will collect them from the box, then scan & process the forms
- On-site Scanning: You can scan your forms on-site using your own multifunctionals or scanners; scanned images are automatically fed into our system for processing
- E-mail: You can e-mail your forms to a deicated iGuana e-mail account; forms that you e-mail are automatically imported into our system for processing
If you decide to scan / capture your forms yourself or e-mail them to us, we will start processing them directly using our sophisticated form recognition and data extraction technology. If you wish us to pick up paper forms from your location(s) or a P.O. Box, we will first need to scan them. To do this we use only the highest quality document scanners available on the market. Our production grade scanners come with state-of-the-art features that enable us to provide you with top quality digital images. In addition, we use our own proprietary technology to control, streamline and automate all our scanning operations, from A to Z – ScanFactory Resource Planning (SRP). Its main purpose is to reduce human intervention to an absolute minimum and remove the risk of human error. All stages of the form scanning process are controlled entirely by our SRP, including quality control, human resource allocation and our high-end document scanners.
The SRP platform automatically classifies the incoming document stream. It identifies all content types (e.g. contract, invoice, application form, tax return, etc.) and leverages 4 classification technologies: image-based, text-based, separation page-based and rule-based. Depending on classification profile and your project settings, the classifiers can be used individually or in combination with each other (voting engine). Structured and semi-structured documents are subject to image classification, unstructured documents are classified by their content using both semantic and keyword-based approaches.
At the recognition stage, our SRP performs a fully automated process of identifying and analyzing documents / forms.
Documents that arrive with multiple pages are identified, sorted and separated from the incoming document stream, and treated as a singular document for processing. This is achieved with blank page detection, header detection, separation sheets or with predefined classification algorithms in our SRP.
We use multiple advanced recognition technologies, including: OCR, ICR, OMR and barcode recognition.
- Optical character recognition “OCR” of printed text in up to 190 languages
- Intelligent character recognition “ICR” for hand-printed text in over 110 languages
- Optical mark recognition “OMR” for a wide range of checkmarks
- Barcode recognition for a variety of 1D (one-dimensional or linear) barcodes and 2D (two-dimensional) barcodes
Manual keying is used in cases where it is not possible to ‘recognize’ forms automatically.
Our SRP automatically extracts data from a variety of forms, structured and unstructured, such as mortgage applications, tax returns, questionnaires, credit card applications, contracts, invoices, and many more. Some business tasks require granular content analysis and understanding. Our SRP platform provides text analytics by automatically identifying and extracting business-relevant information from your content, especially from unstructured information like contracts and reports. Our SRP can also perform full-text extraction. It can extract the entire text from a document via OCR. Full text extraction makes it possible to deliver searchable PDFs.
During the verification stage, our SRP provides automated and manual checks to ensure data accuracy. In-built business logic also determines if a data value corresponds to a linked business system (ERP, DMS, CRM, HRM, etc.), providing the possibility for automated matching services, straight-through processing and high levels of automation.
Automatic data validation ensures high quality of data and reduces the need for manual verification by human operators. Our SRP can perform the following automatic verifications:
- Comparison against databases
- Conformity with built-in validation rules
- Compliance with formats
- Data normalization
- User-defined check
Manual verification (indexing) is used when it cannot be guaranteed that automated data extraction is accurate. If, during manual indexing, the correct input of indexes cannot be fully validated, double-keying method is used i.e. two persons manually index the same document separately from each other and a comparison is made between two index values to see if they are the same. The two indexes must be identical.
Export: Metadata & Image Delivery
Delivery of images and metadata can be done via a Direct File Transfer method (VPN) or External Storage Device method (e.g. encrypted Hard Disk, USB sticks, etc.). You decide which option suits you best. Delivery is a fully automated process governed entirely by SRP. All scanned images and metadata are stored directly in the SRP database. These images and metadata are automatically exported from SRP without any human intervention whatsoever. As a result, you have virtually unlimited flexibility in choosing your preferred image and metadata file formats, such as JPEG, TIFF, PDF, PDF/A, etc. for images, XML file or CSV file, etc. for metadata or any import format for your document management software.
All physical files are kept in quarantine (sealed storage) for a standard period of 1 month from the day scanned documents and metadata are delivered to you. This gives you the opportunity to perform your own quality control. Upon expiration of the quarantine period and your express approval, all quarantined documents are subjected to secure confidential destruction using Security Level P3, DIN 66399 standard. Once physical documents are destroyed, all scanned images and metadata are erased from our SRP system. You receive a confidential destruction certificate.
Privacy & Security
iGuana takes matters of privacy, security and data protection very seriously. We have an in-house Data Protection Officer (DPO) on staff who is responsible for ensuring compliance with internal privacy & security policies and implementing procedures in line with GDPR and with the ISO 27001 standard for information security.
Quality & Image Enhancement
A set of clearly defined procedures governs every single step of our document scanning operations, from Pick-up to Metadata & Image Delivery. Every action taken in relation to your documents is tracked in the SRP system in real time. Our document scanners have built-in image quality control features, such as Perfect Page technology, blank page removal, double feed detection, Dual Stream scanning, Intelligent Quality Control, iThresholding, auto-deskew, etc.
The Image Enhancement module in our SRP system utilizes a 16-core server dedicated exclusively to post-scan image processing and performs a series of complex algorithmic image enhancement tasks. In addition to numerous manual quality checks, our Quality Supervisor is also automatically notified by the SRP of any quality issues detected by SRP’s sophisticated quality control algorithms.
Last but not least, our employees are trained to apply ISO 9001 Quality Management principles in all their work.
We would love to hear from you. One of our specialists will contact you as soon as possible.