Digital Document Processing using
Vision and NLP
The Challenge
An American corporation, specializing in the sale of printing hardware and digital document products and services in over 160 countries, aims to develop comprehensive Intelligent Document Processing (IDP) solutions as microservices. The primary challenge involves the automatic identification of document types and the structured extraction of data from a wide range of document types and formats.
Hypothesis
- A solution is needed to extract text from images and accurately localize specific words within documents. The system must also assign appropriate labels to these words, which will vary depending on the type of document.
- Additionally, a method should be developed to transform the unstructured text into structured information, making it suitable for downstream tasks and further processing.
Execution
A computer vision-based model was implemented for document clustering and classification. A solution was developed for recognizing both printed and handwritten documents using Permuted Autoregressive Sequence Models (PARSeq), capable of performing table and form recognition. Additionally, a fine-tuned custom NLP model was created to execute Named Entity Recognition (NER).
Outcomes
The solution enables the auto-classification of documents, successfully extracting information from printed materials while recognizing various table structures and converting the data into a structured format. However, there is room for improvement in handling noisy or degraded documents.