Data Capture Services for Forms: Form Recognition & Data Extraction

We can capture virtually all kinds of paper forms, extract data from them and export digital images and data directly to your Enterprise Resource Planning System (ERP), Document Management System (DMS), Customer Relationship Management (CRM) system, or any other IT system you use. We scan, capture and extract data from: purchase invoices, surveys, questionnaires, application forms (for loans, mortgages, account opening, jobs, etc.), insurance claims, market research forms, tax returns, complaint forms, medical tests, and many other forms. With over 30 years of experience in scanning, indexing and archiving massive volumes of documents, we know everything there is to know about form processing, recognition and data extraction. Our data capture services are delivered through our ScanFactory division by highly qualified, security-cleared personnel. We work with HR departments, finance & accounting departments, insurance companies, financial institutions, banks, market research firms, government agencies & departments, legal offices & practices, hospitals, and others. Examples of forms we process include (but are not limited to):

  • Purchase invoices
  • Application forms
  • Bank loans & statements
  • Mortgage applications
  • Credit card applications
  • Account opening forms
  • Customer feedback forms
  • Medical tests & forms
  • Complaint & consent forms
  • Insurance claims
  • Tax forms & tax returns
  • Market research forms
  • Surveys & questionnaires
  • Contracts & service agreements
  • Hospital admission & release forms
  • Proofs of delivery (POD)

Multi-channel Form Input

The first step in the process is document input i.e. we need to obtain your forms in order to process them. This can be accomplished in multiple ways. We offer a number of flexible options to choose from. You decide which option suits you best:

  • Pick-up: We can pick up paper forms directly from your location at agreed-upon intervals (e.g. weekly, monthly, etc.), scan them at our scanning facility & process them
  • P.O. Box: A dedicated P.O. Box can be set up for your suppliers or customers to send their documents to (e.g. invoices); we will collect them from the box, then scan & process the forms
  • On-site Scanning: You can scan your forms on-site using your own multifunctionals or scanners; scanned images are automatically fed into our system for processing
  • E-mail: You can e-mail your forms to a deicated iGuana e-mail account; forms that you e-mail are automatically imported into our system for processing

Form Capture

If you decide to scan / capture your forms yourself or e-mail them to us, we will start processing them directly using our sophisticated form recognition and data extraction technology. If you wish us to pick up paper forms from your location(s) or a P.O. Box, we will first need to scan them. To do this we use only the highest quality document scanners available on the market. Our production grade scanners come with state-of-the-art features that enable us to provide you with top quality digital images. In addition, we use our own proprietary technology to control, streamline and automate all our scanning operations, from A to Z – ScanFactory Resource Planning (SRP). Its main purpose is to reduce human intervention to an absolute minimum and remove the risk of human error. All stages of the form scanning process are controlled entirely by our SRP, including quality control, human resource allocation and our high-end document scanners.

Form Classification

The SRP platform automatically classifies the incoming document stream. It identifies all content types (e.g. contract, invoice, application form, tax return, etc.) and leverages 4 classification technologies: image-based, text-based, separation page-based and rule-based. Depending on classification profile and your project settings, the classifiers can be used individually or in combination with each other (voting engine). Structured and semi-structured documents are subject to image classification, unstructured documents are classified by their content using both semantic and keyword-based approaches.

Form Recognition

At the recognition stage, our SRP performs a fully automated process of identifying and analyzing documents / forms.


Documents that arrive with multiple pages are identified, sorted and separated from the incoming document stream, and treated as a singular document for processing. This is achieved with blank page detection, header detection, separation sheets or with predefined classification algorithms in our SRP.

Recognition Technologies

We use multiple advanced recognition technologies, including: OCR, ICR, OMR and barcode recognition.

  • Optical character recognition “OCR” of printed text in up to 190 languages
  • Intelligent character recognition “ICR” for hand-printed text in over 110 languages
  • Optical mark recognition “OMR” for a wide range of checkmarks
  • Barcode recognition for a variety of 1D (one-dimensional or linear) barcodes and 2D (two-dimensional) barcodes

Manual keying is used in cases where it is not possible to ‘recognize’ forms automatically.

Data Extraction

Our SRP automatically extracts data from a variety of forms, structured and unstructured, such as mortgage applications, tax returns, questionnaires, credit card applications, contracts, invoices, and many more. Some business tasks require granular content analysis and understanding. Our SRP platform provides text analytics by automatically identifying and extracting business-relevant information from your content, especially from unstructured information like contracts and reports. Our SRP can also perform full-text extraction. It can extract the entire text from a document via OCR. Full text extraction makes it possible to deliver searchable PDFs.

Data Verification

During the verification stage, our SRP provides automated and manual checks to ensure data accuracy. In-built business logic also determines if a data value corresponds to a linked business system (ERP, DMS, CRM, HRM, etc.), providing the possibility for automated matching services, straight-through processing and high levels of automation.

Automatic Verification

Automatic data validation ensures high quality of data and reduces the need for manual verification by human operators. Our SRP can perform the following automatic verifications:

  • Comparison against databases
  • Conformity with built-in validation rules
  • Compliance with formats
  • Data normalization
  • User-defined check

Manual Verification 

Manual verification (indexing) is used when it cannot be guaranteed that automated data extraction is accurate. If, during manual indexing, the correct input of indexes cannot be fully validated, double-keying method is used i.e. two persons manually index the same document separately from each other and a comparison is made between two index values to see if they are the same. The two indexes must be identical.

Export: Metadata & Image Delivery

Delivery of images and metadata can be done via a Direct File Transfer method (VPN) or External Storage Device method (e.g. encrypted Hard Disk, USB sticks, etc.). You decide which option suits you best. Delivery is a fully automated process governed entirely by SRP. All scanned images and metadata are stored directly in the SRP database. These images and metadata are automatically exported from SRP without any human intervention whatsoever. As a result, you have virtually unlimited flexibility in choosing your preferred image and metadata file formats, such as JPEG, TIFF, PDF, PDF/A, etc. for images, XML file or CSV file, etc. for metadata or any import format for your document management software.

Certified Destruction

All physical files are kept in quarantine (sealed storage) for a standard period of 1 month from the day scanned documents and metadata are delivered to you. This gives you the opportunity to perform your own quality control. Upon expiration of the quarantine period and your express approval, all quarantined documents are subjected to secure confidential destruction using Security Level P3, DIN 66399 standard. Once physical documents are destroyed, all scanned images and metadata are erased from our SRP system. You receive a confidential destruction certificate.

Privacy & Security

iGuana takes matters of privacy, security and data protection very seriously. We have an in-house Data Protection Officer (DPO) on staff who is responsible for ensuring compliance with internal privacy & security policies and implementing procedures in line with GDPR and with the ISO 27001 standard for information security.

  • In-house DPO on staff
  • Strict access controls; 24/7 monitoring
  • Ban on Wi-Fi & mobile phones
  • Background checks & confidentiality agreements
  • Secure transportation & sealed storage
  • Secure image & metadata delivery
  • Confidential certified destruction

ScanFactory Video

Quality & Image Enhancement

A set of clearly defined procedures governs every single step of our document scanning operations, from Pick-up to Metadata & Image Delivery. Every action taken in relation to your documents is tracked in the SRP system in real time. Our document scanners have built-in image quality control features, such as Perfect Page technology, blank page removal, double feed detection, Dual Stream scanning, Intelligent Quality Control, iThresholding, auto-deskew, etc.

The Image Enhancement module in our SRP system utilizes a 16-core server dedicated exclusively to post-scan image processing and performs a series of complex algorithmic image enhancement tasks. In addition to numerous manual quality checks, our Quality Supervisor is also automatically notified by the SRP of any quality issues detected by SRP’s sophisticated quality control algorithms.

Last but not least, our employees are trained to apply ISO 9001 Quality Management principles in all their work.

Request Information

We would love to hear from you. One of our specialists will contact you as soon as possible.